llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 05:42:22 +01:00

Author	SHA1	Message	Date
Eddie-Wang	841c903ff9	Merge branch 'ggerganov:master' into bitnet	2024-06-10 10:51:47 +08:00
Eddie-Wang	abd798d70f	fix code	2024-06-10 02:50:14 +00:00
Georgi Gerganov	10ceba354a	flake.lock: Update (#7838 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-09 16:04:50 -07:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Eddie-Wang1120	65ac3a3627	fix	2024-06-10 00:06:09 +08:00
Eddie-Wang1120	344467f2b8	fix code	2024-06-10 00:00:52 +08:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
Eddie-Wang1120	97d22be58c	fix codestyle	2024-06-09 21:22:50 +08:00
root	3a0f8b0697	clean code 2	2024-06-09 21:15:02 +08:00
root	1c5a8b7fec	clean code	2024-06-09 20:22:03 +08:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
root	dbee0a86c1	move i2 to quantize	2024-06-09 18:20:32 +08:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00
Eddie-Wang	ca09085593	move i2s to quantize v1	2024-06-09 02:43:38 +00:00
compilade	ed9f252118	gguf-py : decouple adding metadata from writing in GGUFWriter (#7827 ) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata	2024-06-09 12:34:29 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit 9422c5e34bbd302493b77a8f6d546154a1f4fe82.	2024-06-09 01:43:39 +02:00
Olivier Chafik	d4d915d351	url: save -mu downloads to new cache location (#7826 ) * url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file	2024-06-08 21:21:08 +02:00
Eddie-Wang	4e1ab50628	finish bitnet i2 e2e	2024-06-08 12:44:13 +00:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
slaren	da799b4189	vulkan : reuse parent extra for views (#7806 ) * vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
intelmatt	27615f5ab2	cmake : fix BUILD_SHARED_LIBS=ON build (#7784 ) common depends on pthreads in Linux	2024-06-07 15:15:07 +03:00
Eddie-Wang1120	2a01a7ce0d	remove unsed	2024-06-07 18:29:59 +08:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00
woodx	a5cabd7649	server : do not get prompt in infill mode (#7286 ) * avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <wudexiang@bytedance.com>	2024-06-07 10:09:45 +03:00
Eddie-Wang1120	5e59660173	finish f16 hf bitnet e2e	2024-06-07 14:42:52 +08:00
pengxin99	d5c938cd77	[SYCL] fix softmax r2r result wrong issue (#7811 )	2024-06-07 14:28:26 +08:00
slaren	c9ee7118d5	check for nans in imatrix and quantize (#7807 ) * imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values	2024-06-07 09:01:29 +03:00
Georgi Gerganov	ee459f40f6	server : fix --threads-http arg (#7801 )	2024-06-06 19:19:59 +03:00
Georgi Gerganov	f83351f9a6	imatrix : migrate to gpt_params (#7771 ) * imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl	2024-06-06 16:30:58 +03:00
Clint Herron	ad675e1c67	Added support for . (any character) token in grammar engine. (#6467 ) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.	2024-06-06 06:08:52 -07:00
Mattheus Chediak	a143c04375	README minor fixes (#7798 ) [no ci] derievatives --> derivatives	2024-06-06 22:17:54 +10:00
Olivier Chafik	55b2d0849d	grammars: x{min,max} repetition operator (#6640 ) * grammars: x{min,max} repetition operator + tweak +//? to avoid duplication of original over alternates grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <hanclinto@gmail.com>	2024-06-06 10:07:06 +01:00
Joan Fontanals	f5d7b268ec	llama : add jina v2 base code (#7596 ) * feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-06 10:22:41 +03:00
slaren	2d08b7fbb4	docker : build only main and server in their images (#7782 ) * add openmp lib to dockerfiles * build only main and server in their docker images	2024-06-06 08:19:49 +03:00
slaren	d67caea0d6	docker : add openmp lib (#7780 )	2024-06-06 08:17:21 +03:00
Eddie-Wang1120	1f2e0ee012	finish bitnet e2e	2024-06-06 12:28:11 +08:00
Galunid	7672adeec7	Fix encoding in python scripts (#7733 )	2024-06-06 03:07:24 +10:00
Eddie-Wang	57dfc3bcdf	hf bitnet e2e v2	2024-06-05 16:01:05 +00:00
Johannes Gäßler	7d1a378b8f	CUDA: refactor mmq, dmmv, mmvq (#7716 ) * CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits b3092	2024-06-05 16:53:00 +02:00
Georgi Gerganov	2b3389677a	ggml : refactor rope norm/neox (#7634 ) * ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci b3091	2024-06-05 11:29:20 +03:00
Eddie-Wang1120	076b4a197b	hf bitnet v1	2024-06-05 16:15:28 +08:00
arch-btw	9973e81c5c	readme : remove -ins (#7759 ) -ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.	2024-06-05 09:40:49 +03:00
jaime-m-p	c90dbe026b	Fix per token atrributes bits (#7749 ) b3089	2024-06-05 01:26:14 +02:00
agray3	b90dc566c1	Allow number of nodes in CUDA graph to change (#7738 ) Previously the code would have failed to cope in the case that the number of nodes changes in an existing CUDA graph. This fixes the issue by removing an unnecessary conditional. b3088	2024-06-04 22:06:49 +02:00
Georgi Gerganov	1442677f92	common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params b3087	2024-06-04 21:23:39 +03:00
Georgi Gerganov	554c247caf	ggml : remove OpenCL (#7735 ) ggml-ci b3086	2024-06-04 21:23:20 +03:00
Georgi Gerganov	0cd6bd3483	llama : remove beam search (#7736 ) b3085	2024-06-04 21:23:05 +03:00

1 2 3 4 5 ...

3134 Commits