Commit Graph

939 Commits

Author SHA1 Message Date
Olivier Chafik
e474ef1df4 update llama-rpc-server bin name + doc 2024-06-11 14:42:03 +01:00
Olivier Chafik
ee3a086fdf
Merge pull request #2 from HanClinto/bins-nits-2
Bins nits again
2024-06-11 02:36:25 +01:00
ochafik
2a9c4cd7ba Merge remote-tracking branch 'origin/master' into bins 2024-06-11 02:35:01 +01:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) 2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters

* json: prevent number precision & whitespace runaways in example grammars

* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
HanClinto
1f5ec2c0b4 Updating two small main references missed earlier in the finetune docs. 2024-06-10 16:12:50 -07:00
HanClinto
70de0debab Updating documentation references for lookup-merge and export-lora 2024-06-10 15:32:21 -07:00
HanClinto
72660c357c Updating run-with-preset.py to use new binary names.
Updating docs around `perplexity` binary rename.
2024-06-10 15:23:32 -07:00
HanClinto
2fd66b2ce2 Updating a few lingering doc references for rename of main to llama-cli 2024-06-10 14:53:23 -07:00
HanClinto
e7e03733b2 Updating docs for eval-callback binary to use new llama- prefix. 2024-06-10 14:44:46 -07:00
Olivier Chafik
f9cfd04bd4 address gbnf-validator unused fread warning (switched to C++ / ifstream) 2024-06-10 17:38:36 +01:00
Olivier Chafik
b8436395b4 rename: llama-cli-cmake-pkg(.exe) 2024-06-10 16:23:45 +01:00
Olivier Chafik
4881a94bee fix test-eval-callback 2024-06-10 16:21:14 +01:00
Olivier Chafik
b8cb44e812 more llama-cli(.exe) 2024-06-10 16:08:06 +01:00
Olivier Chafik
daeaeb1222 Merge remote-tracking branch 'origin/master' into bins 2024-06-10 15:38:41 +01:00
Olivier Chafik
5265c15d4c rename llama|main -> llama-cli; consistent RPM bin prefixes 2024-06-10 15:34:14 +01:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
Olivier Chafik
b0eb3b88e9 rm bin files 2024-06-08 14:16:32 +01:00
Olivier Chafik
eef922e02e sort cmake example subdirs 2024-06-08 14:09:28 +01:00
Olivier Chafik
b648243496 add/fix gbnf-validator subfolder to cmake 2024-06-08 14:07:56 +01:00
Olivier Chafik
81222f02db prefix more cmake targets w/ llama- 2024-06-08 14:05:34 +01:00
Olivier Chafik
10650b692d rename {main->llama}-cmake-pkg binary 2024-06-08 13:57:06 +01:00
Olivier Chafik
78bca8cb07 fix main refs 2024-06-08 13:52:03 +01:00
Olivier Chafik
ab5efbb3b6 Prefix all example bins w/ llama- 2024-06-08 13:42:01 +01:00
Olivier Chafik
23d0df5bd5 main: target name -> llama-cli 2024-06-08 12:50:35 +01:00
Olivier Chafik
fe93cc96cc Merge remote-tracking branch 'origin/master' into bins 2024-06-08 12:04:52 +01:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
ochafik
7fbe6006c9 update straggling refs 2024-06-07 09:42:21 +01:00
woodx
a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
slaren
c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
ochafik
8695baebc0 update more names 2024-06-07 00:21:01 +01:00
Olivier Chafik
9a03341094 main/server: fix targets 2024-06-06 15:53:25 +01:00
Olivier Chafik
8b7c734473 main: update refs -> llama
fix examples/main ref
2024-06-06 15:44:51 +01:00
Olivier Chafik
f298cc63d2 server: update refs -> llama-server
gitignore llama-server
2024-06-06 15:44:40 +01:00
Olivier Chafik
849842916d main/server: rename to llama / llama-server for consistency w/ homebrew 2024-06-06 15:28:27 +01:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox (#7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw
9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
Georgi Gerganov
0cd6bd3483
llama : remove beam search (#7736) 2024-06-04 21:23:05 +03:00
slaren
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
nickp27
9422c5e34b
[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)
* Update rpc-server.cpp to include SYCL backend

Draft PR to address inclusion of SYCL backend for RPC server

* Update rpc-server.cpp
2024-06-02 12:13:54 +03:00