llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-27 20:43:07 +01:00

Author	SHA1	Message	Date
Johannes Gäßler	ba421dd04e	gguf-test: tensor data comparison	2025-01-18 09:49:47 +01:00
Georgi Gerganov	7000623c00	tests : fix gguf context use in same_tensor_data	2025-01-17 16:26:12 +02:00
Georgi Gerganov	e872097c35	cmake : apply only sanitizer flags at top level ggml-ci	2025-01-17 15:48:39 +02:00
Georgi Gerganov	9d1b20ad1a	cmake : move llama.cpp compile flags to top level lists ggml-ci	2025-01-17 15:40:03 +02:00
Georgi Gerganov	9a03bc811f	cmake : move sanitizer flags to llama_add_compile_flags ggml-ci	2025-01-17 15:33:36 +02:00
Georgi Gerganov	ce293d837c	tests : fix compile warnings ggml-ci	2025-01-17 15:22:36 +02:00
Georgi Gerganov	72dc7bff4d	cmake : add sanitizer flags for llama.cpp ggml-ci	2025-01-17 15:22:25 +02:00
codezjx	3edfa7d375	llama.android: add field formatChat to control whether to parse special tokens when send message (#11270 )	2025-01-17 14:57:56 +02:00
Radoslav Gerganov	667d72846c	rpc : early register backend devices (#11262 ) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-01-17 10:57:09 +02:00
Georgi Gerganov	a133566d34	vocab : fix double-eos check (#11273 ) ggml-ci	2025-01-17 09:28:00 +02:00
David Renshaw	960ec65273	llama : fix deprecation message: vocabable -> vocab (#11269 )	2025-01-17 08:12:01 +01:00
musoles	7a689c415e	README : added kalavai to infrastructure list (#11216 )	2025-01-17 01:10:49 +01:00
Jeff Bolz	bd38ddea01	vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166 ) * vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl Shaders are based on cpy.cu. * vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32 * ggml: copy q->f32 assumes some contiguity in the destination	2025-01-16 22:47:10 +01:00
Jeff Bolz	466300fe14	vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206 ) Do masking on whole dwords, fetch all scales at once.	2025-01-16 22:23:49 +01:00
Jeff Bolz	206bc53422	vulkan: optimize coopmat2 q2_k dequant function (#11130 )	2025-01-16 22:16:39 +01:00
RunningLeon	4dbc8b9cb7	llama : add internlm3 support (#11233 ) * support internlm3 * fix lint	2025-01-16 20:10:38 +02:00
Johannes Gäßler	9c8dcefe17	CUDA: backwards pass for misc. ops, add tests (#11257 ) * CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers	2025-01-16 16:43:38 +01:00
Xuan Son Nguyen	681149ced2	llama : add `llama_model_load_from_splits` (#11255 ) * llama : add `llama_model_load_from_splits` * update	2025-01-16 13:54:08 +01:00
fj-y-saito	c67cc9837d	ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227 ) * Add SVE support for q4_K_q8_K * Update ggml/src/ggml-cpu/ggml-cpu-quants.c change to use K_SCALE_SIZE Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-16 11:11:49 +02:00
Eve	adc5dd92e8	vulkan: scale caching for k quants + misc fixes (#11081 ) * q6_k scale caching * 16 bit unpack * q4_k test (slow) * revert it * q3_k * q2_k * little stuff * try precalculating products of a and q2_k scales * Revert "try precalculating products of a and q2_k scales" This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b. * unpack should be u16, add vim swap to gitignore (about time) * better q4_k scales * q5_k * better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations * q2_k better dequant * q3_k optimizations * q3_k use hmask simd from cpu avx version * make the caches happy * q3_k separate out calculation * q2_k separate out * little stuff * use calc_superblock everywhere * q2_k optimize scale calculation * more barriers	2025-01-15 19:50:13 +00:00
Georgi Gerganov	f11cfdfd7f	ci : use -no-cnv in gguf-split tests (#11254 ) * ci : use -no-cnv in gguf-split tests ggml-ci * ci : use -no-cnv in requantize tests ggml-ci * scripts : fix [no ci]	2025-01-15 18:28:35 +02:00
Junil Kim	1d8504338e	fix: ggml: fix vulkan-shaders-gen build (#10448 ) * fix: ggml: fix vulkan-shaders-gen build The vulkan-shaders-gen target was not being built correctly in case of cross-compilation. Other outputs need to be built for the cross compile target, but vulkan-shaders-gen needs to be built for the host. * refactor: ggml: Improve vulkan-shaders-gen toolchain setup - Add GGML_SHADERS_GEN_TOOLCHAIN CMake option. - Auto-detect host toolchain if not set. * refactor: ggml: Improve vulkan-shaders-gen toolchain setup Use configure_file to generate host_toolchain.cmake from template * fix: ggml: Fix compile error Fix compile error not finding vulkan-shaders-gen * fix: vulkan-shaders-gen build and path handling Fix build issues with vulkan-shaders-gen: - Add target dependency for correct build order - Use CMAKE_HOST_SYSTEM_NAME for executable suffix - Fix MSVC output directory in host toolchain - Normalize path handling for cross-compilation * fix: improve host compiler detection in vulkan shader build Improve host compiler detection for vulkan shader generation: - Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches - Consolidate compiler detection logic - Fix Windows-specific MSVC detection - Ensure correct compiler search in cross-compilation * refactor: Simplify CMake function for detecting host compiler Simplified the CMake function to improve the process of detecting the host compiler. * fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt Since `vulkan-shader-gen.cpp` only requires the `glslc` executable and not the Vulkan headers or libraries, CMakeLists.txt needs to be corrected. (See: `ecc93d0558`) * refactor: Rename host_toolchain.cmake.in - Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in * refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN	2025-01-15 14:17:42 +01:00
Johannes Gäßler	432df2d5f9	RoPE: fix back, CUDA support for back + noncont. (#11240 ) * RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]	2025-01-15 12:51:37 +01:00
Daniel Bevenius	0ccd7f3eb2	examples : add embd_to_audio to tts-outetts.py [no ci] (#11235 ) This commit contains a suggestion for adding the missing embd_to_audio function from tts.cpp to tts-outetts.py. This introduces a depencency numpy which I was not sure if that is acceptable or not (only PyTorch was mentioned in referened PR). Also the README has been updated with instructions to run the example with llama-server and the python script. Refs: https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2548377734	2025-01-15 05:44:38 +01:00
Akarshan Biswas	f446c2cf6a	SYCL: Add gated linear attention kernel (#11175 ) * SYCL: Add Gated Linear attention kernel * glahpp: add a space at the end of file * gla: Put the barrier inside the main logic loop	2025-01-15 11:20:17 +08:00
Xuan Son Nguyen	b4d92a59a2	ci : add -no-cnv for tests (#11238 )	2025-01-14 16:42:23 +02:00
Georgi Gerganov	bbf3e55e35	vocab : add dummy tokens for "no_vocab" type (#11231 ) * vocab : add dummy tokens for "no_vocab" type ggml-ci * vocab : minor [no ci]	2025-01-14 11:54:58 +01:00
ebraminio	c5bf0d1bd7	server : Improve code snippets direction between RTL text (#11221 )	2025-01-14 11:39:33 +01:00
Olivier Chafik	091592d758	Refactor test-chat-template.cpp (#11224 ) * Refactor test-chat-template * Update test-chat-template.cpp	2025-01-14 10:16:41 +00:00
Georgi Gerganov	44d1e796d0	sync : ggml	2025-01-14 10:39:42 +02:00
Georgi Gerganov	a4f3f5d8e6	scripts : sync gguf (cont)	2025-01-14 09:40:52 +02:00
Georgi Gerganov	48e1ae0e61	scripts : sync gguf	2025-01-14 09:36:58 +02:00
Georgi Gerganov	d00a80e89d	scripts : sync opencl	2025-01-14 09:19:58 +02:00
ebraminio	504af20ee4	server : (UI) Improve messages bubble shape in RTL (#11220 ) I simply have overlooked message bubble's tail placement for RTL text as I use the dark mode and that isn't visible there and this fixes it.	2025-01-13 20:23:31 +01:00
Xuan Son Nguyen	84a44815f7	cli : auto activate conversation mode if chat template is available (#11214 ) * cli : auto activate conversation mode if chat template is detected * add warn on bad template * update readme (writing with the help of chatgpt) * update readme (2) * do not activate -cnv for non-instruct models	2025-01-13 20:18:12 +01:00
Andreas Kieslinger	39509fb082	cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042 ) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-01-13 16:45:53 +01:00
Georgi Gerganov	a29f0870d4	contrib : add naming guidelines (cont) (#11177 )	2025-01-13 15:59:26 +02:00
ebraminio	437e05f714	server : (UI) Support for RTL text as models input or output (#11208 )	2025-01-13 14:46:39 +01:00
Georgi Gerganov	ca001f6656	contrib : add naming guidelines (cont) (#11177 )	2025-01-13 15:08:44 +02:00
Xuan Son Nguyen	00b4c3da62	common : support tag-based --hf-repo like on ollama (#11195 ) * common : support tag-based hf_repo like on ollama * fix build * various fixes * small fixes * fix style * fix windows build? * move common_get_hf_file to common.cpp * fix complain with noreturn	2025-01-13 13:56:23 +01:00
Georgi Gerganov	7426a26b24	contrib : add naming guidelines (#11177 ) * contrib : add naming guidelines * contrib : expand naming guidelines [no ci] * contrib : cont [no ci] * contrib : add `_t` suffix guideline [no ci] * contrib : cont [no ci] * minor [no ci] * contrib : move coding guidelines to correct section [no ci] * contrib : minor reword coding guidelines [no ci] * contrib : add TODO for preprocessor directives [no ci] * contrib : expand [no ci] * minor [no ci] * contrib : clarify `_context` suffix usage [no ci] * contrib : filename guidelines [no ci] * contrib : fix notes [no ci]	2025-01-13 14:46:36 +02:00
Daniel Bevenius	8f70fc3d1b	llama : remove 'd' from bad special token log (#11212 ) This commit removes the 'd' from the log message in llama-vocab.cpp when logging a bad special token. The motivation for this is that currently the output can look something like the following: ```console load: bad special token: 'tokenizer.ggml.image_token_id' = 128256d, using default id -1 ```	2025-01-13 13:38:20 +01:00
Radoslav Gerganov	1244cdcf14	ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211 ) Build fails when using HIP and GGML_BACKEND_DL: ``` /usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg' collect2: error: ld returned 1 exit status ``` This patch fixes this.	2025-01-13 13:31:41 +02:00
Eric Curtin	924518e2e5	Reset color before we exit (#11205 ) We don't want colors to leak post termination of llama-run. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-12 18:23:10 +00:00
Xuan Son Nguyen	9a483999a6	llama : fix chat template gguf key (#11201 )	2025-01-12 13:45:14 +01:00
Georgi Gerganov	08f10f69c3	llama : remove notion of CLS token (#11064 ) ggml-ci	2025-01-12 12:15:53 +02:00
Georgi Gerganov	afa8a9ec9b	llama : add `llama_vocab`, functions -> methods, naming (#11110 ) * llama : functions -> methods (#11110) * llama : add struct llama_vocab to the API (#11156) ggml-ci * hparams : move vocab params to llama_vocab (#11159) ggml-ci * vocab : more pimpl (#11165) ggml-ci * vocab : minor tokenization optimizations (#11160) ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com> * lora : update API names (#11167) ggml-ci * llama : update API names to use correct prefix (#11174) * llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci] * vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174) ggml-ci * vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174) ggml-ci --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-12 11:32:42 +02:00
Vinesh Janarthanan	c05e8c9934	gguf-py: fixed local detection of gguf package (#11180 ) * updated path to gguf package for non-installed setups * added reader.py to readme * Bumped gguf version to 0.15.0	2025-01-11 11:42:31 +02:00
Daniel Bevenius	2739a71e4b	convert : sort print supported models [no ci] (#11179 ) This commit sorts the list of supported models when printing them out. The motivation for this change is to make it easier to find a specific model in the list of supported models. For example: ```console $ ./convert_hf_to_gguf.py --print-supported-models Supported models: - ArcticForCausalLM - BaiChuanForCausalLM - BaichuanForCausalLM - BertForMaskedLM - BertModel - BitnetForCausalLM - BloomForCausalLM - BloomModel - CamembertModel - ChameleonForCausalLM - ChameleonForConditionalGeneration - ChatGLMForConditionalGeneration - ChatGLMModel - CodeShellForCausalLM - Cohere2ForCausalLM - CohereForCausalLM - DbrxForCausalLM - DeciLMForCausalLM - DeepseekForCausalLM - DeepseekV2ForCausalLM - DeepseekV3ForCausalLM - ExaoneForCausalLM - FalconForCausalLM - FalconMambaForCausalLM - GPT2LMHeadModel - GPTBigCodeForCausalLM - GPTNeoXForCausalLM - GPTRefactForCausalLM - Gemma2ForCausalLM - GemmaForCausalLM - GraniteForCausalLM - GraniteMoeForCausalLM - GrokForCausalLM - InternLM2ForCausalLM - JAISLMHeadModel - JinaBertForMaskedLM - JinaBertModel - LLaMAForCausalLM - LlamaForCausalLM - LlavaStableLMEpochForCausalLM - MPTForCausalLM - MT5ForConditionalGeneration - MambaForCausalLM - MambaLMHeadModel - MiniCPM3ForCausalLM - MiniCPMForCausalLM - MistralForCausalLM - MixtralForCausalLM - NemotronForCausalLM - NomicBertModel - OLMoForCausalLM - Olmo2ForCausalLM - OlmoForCausalLM - OlmoeForCausalLM - OpenELMForCausalLM - OrionForCausalLM - Phi3ForCausalLM - PhiForCausalLM - PhiMoEForCausalLM - PlamoForCausalLM - QWenLMHeadModel - Qwen2ForCausalLM - Qwen2MoeForCausalLM - Qwen2VLForConditionalGeneration - RWForCausalLM - RWKV6Qwen2ForCausalLM - RobertaModel - Rwkv6ForCausalLM - StableLMEpochForCausalLM - StableLmForCausalLM - Starcoder2ForCausalLM - T5EncoderModel - T5ForConditionalGeneration - T5WithLMHeadModel - UMT5ForConditionalGeneration - WavTokenizerDec - XLMRobertaForSequenceClassification - XLMRobertaModel - XverseForCausalLM ```	2025-01-11 05:50:33 +01:00
Daniel Bevenius	ba8a1f9c5b	examples : add README.md to tts example [no ci] (#11155 ) * examples : add README.md to tts example [no ci] * squash! examples : add README.md to tts example [no ci] Fix heading to be consistent with other examples, and add a quickstart section to README.md. * squash! examples : add README.md to tts example [no ci] Fix spelling mistake.	2025-01-10 13:16:16 +01:00

1 2 3 4 5 ...

4509 Commits