llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-31 14:13:09 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	af19d35734	server : OAI API compatibility (#4198 ) * Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com> Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>	2023-11-25 11:29:06 +02:00
slaren	e9c13ff781	llama : set metal log callback correctly (#4204 )	2023-11-24 18:10:01 +01:00
slaren	8a052c131e	ggml-cuda : support stablelm rope (#4156 ) * ggml-cuda : support stablelm rope * remove unused freq_base kernel parameter * add n_dims parameter to llm_build_k_shift, default to n_rot via overload * llama : fix llm_build_k_shift args --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-24 18:04:31 +01:00
Galunid	189d68446e	convert : fix tensors using grad in some models (#4173 )	2023-11-24 15:02:49 +01:00
eastriver	2568a4bf54	main.swift : fix eos checking (#4197 ) llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.	2023-11-24 11:25:10 +02:00
Aaryaman Vasishta	b35f3d0def	readme : use PATH for Windows ROCm (#4195 ) * Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md	2023-11-24 09:52:39 +02:00
Haohui Mai	55978ce09b	Fix incorrect format strings and uninitialized variables. (#4133 ) * Fix incorrect format strings and uninitialized variables. * Address comments * Add the missing include statement	2023-11-23 22:56:53 +01:00
Georgi Gerganov	6b0a7420d0	llama : KV cache view API + better KV cache management (#4170 ) * llama : keep track of used KV cells + better KV cache management * llama : zero KV cache used upon clear ggml-ci * llama : allow exporting a view of the KV cache (#4180) * Allow exporting a view of the KV cache * Allow dumping the sequences per cell in common * Track max contiguous cells value and position as well * Fix max contiguous empty cells index calculation Make dump functions deal with lengths or sequences counts > 10 better * Fix off by one error in dump_kv_cache_view * Add doc comments for KV cache view functions Eliminate cell sequence struct; use llama_seq_id directly Minor cleanups * common : add -dkvc arg for enabling kv cache dumps --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>	2023-11-23 19:07:56 +02:00
Georgi Gerganov	d103d935c0	readme : update hot topics	2023-11-23 13:51:22 +02:00
Daniel Bevenius	9d5949f04b	examples : fix typo in parallel example doc comment (#4181 ) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-11-23 13:34:20 +02:00
Georgi Gerganov	ff8238f71d	docs : add llama-star arch idea	2023-11-23 11:35:04 +02:00
Galunid	8e672efe63	stablelm : simplify + speedup generation (#4153 )	2023-11-21 16:22:30 +01:00
Galunid	0b871f1a04	finetune - update readme to mention llama support only (#4148 )	2023-11-20 19:30:00 +01:00
Aaryaman Vasishta	dfc7cd48b1	readme : update ROCm Windows instructions (#4122 ) * Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-20 17:02:46 +02:00
Seb C	881800d1f0	main : Add ChatML functionality to main example (#4046 ) Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>	2023-11-20 14:56:59 +01:00
Galunid	f23c0359a3	ci : add flake8 to github actions (python linting) (#4129 ) Disabled rules: * E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned * E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned * E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned * E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard * E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned * E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned * E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard * E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard * E266 Too many leading '#' for block comment - sometimes used as "section" separator * E501 Line too long - disabled because it's broken so often it seems like a standard * E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead) * E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)	2023-11-20 11:35:47 +01:00
Branden Butler	40a34fe8d0	speculative : fix prompt tokenization in speculative example (#4025 ) * Support special tokens and not adding BOS to prompt in speculative * Adapt to new should_add_bos function * Ensure tgt and dft have same add_bos setting	2023-11-20 11:50:04 +02:00
Georgi Gerganov	dae06c06e5	Revert "finetune : add --n-gpu-layers flag info to --help (#4128 )" This reverts commit `05e8301e45`.	2023-11-19 19:16:07 +02:00
Clark Saben	05e8301e45	finetune : add --n-gpu-layers flag info to --help (#4128 )	2023-11-19 18:56:38 +02:00
SoftwareRenderer	936c79b227	server : relay error messages (#4131 )	2023-11-19 18:54:10 +02:00
kchro3	262005ad9d	common : comma should be semicolon (#4137 )	2023-11-19 18:52:57 +02:00
Georgi Gerganov	35985acffa	gitignore : tokenize	2023-11-19 18:50:49 +02:00
slaren	e937066420	gguf-py : export chat templates (#4125 ) * gguf-py : export chat templates * llama.cpp : escape new lines in gguf kv info prints * gguf-py : bump version * gguf-py : check chat_template type * gguf-py : initialize chat_template	2023-11-19 11:10:52 +01:00
Kerfuffle	28a2e6e7d4	tokenize example: Respect normal add BOS token behavior (#4126 ) Allow building with Makefile	2023-11-18 14:48:17 -07:00
Galunid	0b5c3b0457	scripts : Remove missed baichuan convert script (#4127 )	2023-11-18 21:08:33 +01:00
Kerfuffle	2923f17f6f	Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124 ) * ggml-cuda.cu: Clean up warnings when compiling with clang * ggml-cuda.cu: Move static items into anonymous namespace * ggml-cuda.cu: Fix use of namespace start macro * Revert "ggml-cuda.cu: Fix use of namespace start macro" This reverts commit `26c1149026`. * Revert "ggml-cuda.cu: Move static items into anonymous namespace" This reverts commit `e29757e0f7`.	2023-11-18 08:11:18 -07:00
slaren	bbecf3f415	llama : increase max nodes (#4115 )	2023-11-17 21:39:11 +02:00
Roger Meier	8e9361089d	build : support ppc64le build for make and CMake (#3963 ) * build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-17 18:11:23 +02:00
Georgi Gerganov	5ad387e994	tokenize : fix trailing whitespace	2023-11-17 18:01:38 +02:00
zakkor	2fa02b4b3d	examples : add tokenize (#4039 )	2023-11-17 17:36:44 +02:00
Don Mahurin	2ab0707acb	convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089 ) Co-authored-by: Don Mahurin <@>	2023-11-17 17:32:34 +02:00
John	11173c92d6	py : Falcon HF compatibility (#4104 ) Falcon HF compatibility	2023-11-17 17:24:30 +02:00
Jannis Schönleber	9e87ef60e1	common : improve yaml log escaping (#4080 ) * logging: improve escaping in yaml output * logging: include review feedback	2023-11-17 17:24:07 +02:00
Huawei Lin	c7cce1246e	llava : fix compilation warning that fread return value is not used (#4069 )	2023-11-17 17:22:56 +02:00
Jiří Podivín	f7d5e97542	py : remove superfluous import statements (#4076 ) Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>	2023-11-17 17:20:53 +02:00
Jiří Podivín	ba4cf5c0bf	train : move number of gpu layers argument parsing to common/train.cpp (#4074 ) - introduces help entry for the argument - cuts '--gpu-layers' form in order to simplify usage and documentation. Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>	2023-11-17 17:19:16 +02:00
slaren	e85bb1a8e7	llama : add functions to get the model's metadata (#4013 ) * llama : add functions to get the model's metadata * format -> std::to_string * better documentation	2023-11-17 17:17:37 +02:00
gwjr	3e916a07ac	finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079 ) * Remove logically superfluous assertions and order by dimension * Use cblas_sgemm() to implement ggml_compute_forward_out_prod() * Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors on cmake/zig, remove trailing whitespace * Add openBLAS support for sgemm() in compute_forward_out_prod()	2023-11-17 16:48:19 +02:00
Andrew Godfrey	947f64f163	finetune : zero the loraB initial vectors (#4082 ) * finetune : zero the loraB initial vectors Without this, the first iteration is starting out far from the base model, instead of exactly on it. Zeroing loraB is what the paper recommends. loralib also zeroes at least one of the init vector pairs (though it departs from the paper in using a different distribution for the other vector, in some cases). * tabs to spaces * Use ggml_set_zero instead of adding a new function	2023-11-17 11:23:11 +01:00
Andrew Godfrey	b83e149ec6	cuda : get_row_rounding F32 (#4095 ) * Fix #4017 * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-17 10:01:15 +02:00
Georgi Gerganov	4f447a4833	llama : fix data units (#4101 ) * llama : fix data units ggml-ci * Revert "llama : fix data units" This reverts commit `f5feac831f`. * llama : disambiguate data units ggml-ci	2023-11-17 10:00:15 +02:00
Kerfuffle	91f6499393	Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040 ) * gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode. * Respect add_bos_token GGUF metadata value * gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time	2023-11-16 19:14:37 -07:00
texmex76	8da46278e1	gguf : fix potential infinite loops while parsing (#4100 ) Co-authored-by: Bernhard Gstrein <gstrein@cs.uni-freiburg.de>	2023-11-16 17:01:48 +02:00
Jared Van Bortel	a6fc554e26	llama : restore prefix space in llama tokenizer (#4081 )	2023-11-15 11:34:47 -05:00
slaren	1cf2850d52	ggml-cuda : increase max graph size (#4084 )	2023-11-15 14:58:13 +02:00
Michael Potter	6bb4908a17	Fix MacOS Sonoma model quantization (#4052 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-14 12:34:41 -05:00
Galunid	36eed0c42c	stablelm : StableLM support (#3586 ) * Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers	2023-11-14 11:17:12 +01:00
afrideva	b46d12f86d	convert.py: also look for plain model.safetensors (#4043 ) * add safetensors to convert.py help message * Check for single-file safetensors model * Update convert.py "model" option help message * revert convert.py help message change	2023-11-13 18:03:40 -07:00
M. Yusuf Sarıgöz	bd90eca237	llava : fix regression for square images in #3613 (#4056 )	2023-11-13 18:20:52 +03:00
Georgi Gerganov	3d68f364f1	ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060 ) ggml-ci	2023-11-13 16:55:52 +02:00

... 22 23 24 25 26 ...

2711 Commits