3131 Commits

Author SHA1 Message Date
Olivier Chafik
efaa441233 fix llama-lookup-* Makefile rules 2024-06-08 14:26:11 +01:00
Olivier Chafik
b0eb3b88e9 rm bin files 2024-06-08 14:16:32 +01:00
Olivier Chafik
eef922e02e sort cmake example subdirs 2024-06-08 14:09:28 +01:00
Olivier Chafik
b648243496 add/fix gbnf-validator subfolder to cmake 2024-06-08 14:07:56 +01:00
Olivier Chafik
81222f02db prefix more cmake targets w/ llama- 2024-06-08 14:05:34 +01:00
Olivier Chafik
10650b692d rename {main->llama}-cmake-pkg binary 2024-06-08 13:57:06 +01:00
Olivier Chafik
78bca8cb07 fix main refs 2024-06-08 13:52:03 +01:00
Olivier Chafik
ab5efbb3b6 Prefix all example bins w/ llama- 2024-06-08 13:42:01 +01:00
Olivier Chafik
23d0df5bd5 main: target name -> llama-cli 2024-06-08 12:50:35 +01:00
Olivier Chafik
fe93cc96cc Merge remote-tracking branch 'origin/master' into bins 2024-06-08 12:04:52 +01:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
slaren
da799b4189
vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Olivier Chafik
0dba58269f Update server-llm.sh 2024-06-07 11:52:40 +01:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
ochafik
af8f0169da Update .gitignore 2024-06-07 10:14:03 +01:00
ochafik
7fbe6006c9 update straggling refs 2024-06-07 09:42:21 +01:00
ochafik
99df4cc091 rm accidentally checked in bins 2024-06-07 09:40:09 +01:00
woodx
a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99
d5c938cd77
[SYCL] fix softmax r2r result wrong issue (#7811) 2024-06-07 14:28:26 +08:00
slaren
c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
ochafik
fbd83131f5 Merge remote-tracking branch 'origin/master' into bins 2024-06-07 00:51:31 +01:00
ochafik
a0a7f2b031 Update build.yml 2024-06-07 00:38:05 +01:00
ochafik
8695baebc0 update more names 2024-06-07 00:21:01 +01:00
Georgi Gerganov
ee459f40f6
server : fix --threads-http arg (#7801) 2024-06-06 19:19:59 +03:00
Olivier Chafik
9a03341094 main/server: fix targets 2024-06-06 15:53:25 +01:00
Olivier Chafik
8b7c734473 main: update refs -> llama
fix examples/main ref
2024-06-06 15:44:51 +01:00
Olivier Chafik
f5f19a236f server: simplify nix package 2024-06-06 15:44:40 +01:00
Olivier Chafik
f298cc63d2 server: update refs -> llama-server
gitignore llama-server
2024-06-06 15:44:40 +01:00
Olivier Chafik
849842916d main/server: rename to llama / llama-server for consistency w/ homebrew 2024-06-06 15:28:27 +01:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. (#6467)
* Added support for . (any characer) token in grammar engine.

* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Mattheus Chediak
a143c04375
README minor fixes (#7798) [no ci]
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Joan Fontanals
f5d7b268ec
llama : add jina v2 base code (#7596)
* feat: add changes to handle jina v2 base code

* fix: do not complicate things

* fix: fix the usage of the code model

* fix: fix comments

* fix: fix linting issues

* fix: remove ollama patches

* style : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-06 10:22:41 +03:00
slaren
2d08b7fbb4
docker : build only main and server in their images (#7782)
* add openmp lib to dockerfiles

* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren
d67caea0d6
docker : add openmp lib (#7780) 2024-06-06 08:17:21 +03:00
Galunid
7672adeec7
Fix encoding in python scripts (#7733) 2024-06-06 03:07:24 +10:00
Johannes Gäßler
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
b3092
2024-06-05 16:53:00 +02:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox (#7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
b3091
2024-06-05 11:29:20 +03:00
arch-btw
9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
jaime-m-p
c90dbe026b
Fix per token atrributes bits (#7749) b3089 2024-06-05 01:26:14 +02:00
agray3
b90dc566c1
Allow number of nodes in CUDA graph to change (#7738)
Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.
b3088
2024-06-04 22:06:49 +02:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
b3087
2024-06-04 21:23:39 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
b3086
2024-06-04 21:23:20 +03:00
Georgi Gerganov
0cd6bd3483
llama : remove beam search (#7736) b3085 2024-06-04 21:23:05 +03:00
Georgi Gerganov
5ca0944a15
readme : remove obsolete Zig instructions (#7471) b3084 2024-06-04 19:43:01 +03:00
slaren
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
compare-commits.sh : hide stdout, use -oe to print markdown
b3083
2024-06-04 14:32:42 +02:00
Daniele
987d743d6b
Improve hipBLAS support in CMake (#7696)
* Improve hipBLAS support in CMake

This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.

* Set ROCM_PATH correctly
b3082
2024-06-04 14:09:15 +02:00