llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 06:39:25 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	bc5ba007b2	server : check that the prompt fits in the slot's context (#10030 ) ggml-ci	2024-10-25 10:13:46 +03:00
Xuan Son Nguyen	958367bf53	server : refactor slot input data, move tokenizer to HTTP thread (#10023 ) * server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere	2024-10-24 21:51:22 +02:00
Georgi Gerganov	40f2555797	ci : fix cmake flags for SYCL	2024-10-24 21:23:33 +03:00
Johannes Gäßler	167a515651	CUDA: fix insufficient buffer clearing for MMQ (#10032 )	2024-10-24 14:40:23 +02:00
Johannes Gäßler	c39665f589	CUDA: fix MMQ for non-contiguous src0, add tests (#10021 ) * CUDA: fix MMQ for non-contiguous src0, add tests * revise test code	2024-10-24 11:09:36 +02:00
wwoodsTM	0a1c750c80	server : samplers accept the prompt correctly (#10019 )	2024-10-23 22:27:51 +03:00
Georgi Gerganov	190a37d797	sync : ggml	2024-10-23 17:23:55 +03:00
Georgi Gerganov	2d3aba9ee8	llama.vim : bump generation time limit to 3s [no ci]	2024-10-23 17:16:56 +03:00
Johannes Gäßler	80273a306d	CUDA: fix 1D im2col, add tests (ggml/993)	2024-10-23 16:50:02 +03:00
Daniel Bevenius	c19af0acb1	ggml : remove redundant set of contexts used field (ggml/978) This commit removes the setting of the `used` field of the contexts in the global state (g_state) in `ggml_init`. The motivation for this change is that I believe that this additional initialization might not be required after the changes in Commit 45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from whisper.cpp"), which changed the initialization of the contexts field from `{ 0 }` to `{ { 0 } }`: ```console g_state = (struct ggml_state) { - /.contexts =/ { 0 }, + /.contexts =/ { { 0 } }, }; ``` My understanding is that the `{0}` initialization might not have zero-initialized all the nested fields in every array element because of compiler differences, and might have been the reason for having the explicit setting of the `used` fields to false.	2024-10-23 16:50:02 +03:00
Michael Coppola	ac113a0fee	llama.vim : add classic vim support (#9995 ) * added classic vim support * fixed ring update, removed blank line * minor * minor * minor doc update * removed uneeded var * minor * minor * fixed job_start creating new scratch buffers * fixed job_start creating new scratch buffers * fixed ghost text indenting when expandtab is on * removed unused code * minor * unified fim_on_exit * minor * vim ghost text rendering now uses pos_x and pos_y parameters * renamed _hlgroup to hlgroup_ * renamed _ghost_text to ghost_text_, moved nvim/vim detection to llama#init() * minor --------- Co-authored-by: Michael Coppola <info@michaeljcoppola.com>	2024-10-23 14:09:26 +03:00
Jun Hee Yoo	4c9388fb96	metal : add POOL2D and fix IM2COL (#9943 ) * add pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix im2col and add unittest for N>=1024 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * add tests for N % 1024 != 0 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * remove trailing whitespaces Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply suggestions Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply more optimization - original IM2COL kernel + _ext with MIN() Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review: change kernel name of pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix more formatting and enhance readability Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> --------- Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>	2024-10-23 13:33:45 +03:00
github-actions[bot]	873279b159	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/5633bcff0c6162b9e4b5f1264264611e950c8ec7?narHash=sha256-9UTxR8eukdg%2BXZeHgxW5hQA9fIKHsKCdOIUycTryeVw%3D' (2024-10-09) → 'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)	2024-10-23 01:28:07 +00:00
Xuan Son Nguyen	c8c07d658a	llama : fix empty batch causing llama_batch_allocr to crash (#9966 ) * llama : fix empty batch cause llama_batch_allocr to crash * move batch_allocr inside decode/encode_internal * fix build * add GGML_ASSERT * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-10-22 16:59:02 +02:00
Daniel Bevenius	19d900a756	llama : rename batch to ubatch (#9950 ) This commit renames the member field batch in llm_build_context to ubatch, and also the parameter batch in llama_build_graph, and llama_set_inputs to ubatch. The motivation for this change is to make the code more readable (considering there are the structs llama_batch and llama_sbatch), and consistent with other parts of the code base where parameters/fields of type llama_ubatch are named ubatch.	2024-10-22 16:31:06 +03:00
Molly Sophia	11d47057a5	Rwkv chat template fix (#10001 ) * llama: remove useless template matching for rwkv-world Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * converter: Add comment about the hack for rwkv models Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-10-22 15:22:26 +02:00
Xuan Son Nguyen	c421ac072d	lora : warn user if new token is added in the adapter (#9948 )	2024-10-22 13:08:41 +02:00
Molly Sophia	4ff7fe1fb3	llama : add chat template for RWKV-World + fix EOT (#9968 ) * Add chat template for RWKV-World Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Fix the chat template not being used Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV v6: Set EOT token to ``\n\n`` Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * readme: add rwkv into supported model list Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-10-22 13:33:37 +03:00
leo-pony	6b8447352d	[CANN] Adapt to dynamically loadable backends mechanism (#9970 ) * [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request	2024-10-22 16:16:01 +08:00
Daniel Bevenius	674804a996	arg : fix typo in embeddings argument help [no ci] (#9994 ) This commit fixes two typos in the help text for the `--embd-normalize` and `--embd-separator` arguments. It also updates common.h which contain the same typo in two comments.	2024-10-22 10:40:02 +03:00
Georgi Gerganov	e94a138d64	llama.vim : fix info text display [no ci] (#9787 )	2024-10-22 00:37:55 +03:00
Georgi Gerganov	e01c67affe	llama.vim : move info to the right of screen [no ci] (#9787 ) 'eol' messes up the rendering with nvim v0.10.2 for some reason	2024-10-21 22:53:18 +03:00
Asghar Ghorbani	994cfb1acb	readme : update UI list (#9972 ) add PocketPal AI app	2024-10-21 21:20:59 +03:00
Daniel Bevenius	94008cc760	arg : fix attention non-causal arg value hint (#9985 ) This commit updates the argument value hint for the `--attention` argument to `non-causal`. The motivation for this change is that the only values for this argument are `causal` and `non-causal`.	2024-10-21 21:12:52 +03:00
Georgi Gerganov	dbd5f2f573	llama.vim : plugin for Neovim (#9787 )	2024-10-21 20:25:02 +03:00
Georgi Gerganov	f594bc80ba	ggml : add asserts for type conversion in fattn kernels (#9971 ) ggml-ci	2024-10-21 16:20:46 +03:00
Radoslav Gerganov	d5ebd79c76	rpc : pack only RPC structs (#9959 )	2024-10-21 13:35:40 +03:00
Georgi Gerganov	55e47786e3	llama : default sampling changes + greedy update (#9897 ) * llama : deprecate softmax sampler + fix dist sampler ggml-ci * tests : replace macros with functions ggml-ci * sampling : change temperature sampler logic For t <= 0.0f, keep the max logit intact and set the rest to -inf * cont : no need for special "greedy" logic top-k == 1 is the same * tests : init prob correctly * llama : handle temp <= 0.0 in the temp_ext sampler too ggml-ci * cont : avoid extra loop in temperature sampler for sub-zero temp ggml-ci	2024-10-21 09:46:40 +03:00
Georgi Gerganov	bc21975084	speculative : fix handling of some input params (#9963 ) * speculative : fix batch sizes at initialization ggml-ci * speculative : handle params.n_predict == -1 * speculative : limit batch size to llama_n_batch	2024-10-21 09:37:12 +03:00
Neo Zhang Jianyu	1db8c84fc6	fix mul_mat_vec_q and *_vec_q error (#9939 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-10-21 14:26:09 +08:00
Loïc Carrère	45f097645e	readme : update bindings list (#9951 ) Update the binding list by adding LM-Kit.NET (C# & VB.NET)	2024-10-20 19:25:41 +03:00
icppWorld	7cab2083c7	readme : update infra list (#9942 ) llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.	2024-10-20 19:01:34 +03:00
Xuan Son Nguyen	cda0e4b648	llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 ) * refactor llama_batch_get_one * adapt all examples * fix simple.cpp * fix llama_bench * fix * fix context shifting * free batch before return * use common_batch_add, reuse llama_batch in loop * null terminated seq_id list * fix save-load-state example * fix perplexity * correct token pos in llama_batch_allocr	2024-10-18 23:18:01 +02:00
Radoslav Gerganov	afd9909a64	rpc : backend refactoring (#9912 ) * rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server	2024-10-18 14:33:58 +03:00
Ouadie EL FAROUKI	87421a23e8	[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705 ) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-10-18 06:46:16 +01:00
Ma Mingfei	60ce97c9d8	add amx kernel for gemm (#8998 ) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-10-18 13:34:36 +08:00
Georgi Gerganov	8901755ba3	server : add n_indent parameter for line indentation requirement (#9929 ) ggml-ci	2024-10-18 07:32:19 +03:00
Daniel Bevenius	6f55bccbb8	llama : rename batch_all to batch (#8881 ) This commit addresses the TODO in the code to rename the `batch_all` parameter to `batch` in `llama_decode_internal`.	2024-10-18 01:41:51 +02:00
Georgi Gerganov	17bb928080	readme : remove --memory-f32 references (#9925 )	2024-10-17 23:43:05 +03:00
Georgi Gerganov	9f45fc1e99	llama : change warning to debug log	2024-10-17 23:27:42 +03:00
Georgi Gerganov	99bd4ac28c	llama : infill sampling handle very long tokens (#9924 ) * llama : infill sampling handle very long tokens ggml-ci * cont : better indices ggml-ci	2024-10-17 22:32:47 +03:00
Tim Wang	3752217ed5	readme : update bindings list (#9918 ) Co-authored-by: Tim Wang <tim.wang@ing.com>	2024-10-17 09:57:14 +03:00
Diego Devesa	f010b77a37	vulkan : add backend registry / device interfaces (#9721 ) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-10-17 02:46:58 +02:00
Gilad S.	2194200278	fix: allocating CPU buffer with size `0` (#9917 )	2024-10-17 01:34:22 +02:00
Gilad S.	73afe681aa	fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875 ) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment	2024-10-17 00:36:51 +02:00
Daniel Bevenius	9e04102448	llama : suppress conversion from 'size_t' to 'int' (#9046 ) * llama : suppress conversion from 'size_t' to 'int' This commit updates llm_tokenizer_spm.tokenize to suppress/remove the following warnings that are generated on Windows when using MSVC: ```console src\llama-vocab.cpp(211,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data src\llama-vocab.cpp(517,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data ``` This is done by adding a cast for the size_t returned from symbols.size(). I believe this is safe as it seems unlikely that symbols, which stores an entry for each UTF8 character, would become larger than INT_MAX. The motivation for this change is to reduce the number of warnings that are currently generated when building on Windows. * squash! llama : suppress conversion from 'size_t' to 'int' Move cast into for loop.	2024-10-16 20:34:28 +03:00
Daniel Bevenius	dbf18e4de9	llava : fix typo in error message [no ci] (#9884 )	2024-10-16 20:24:05 +03:00
Joe Eli McIlvain	66c2c93082	grammar : fix JSON Schema for string regex with top-level alt. (#9903 ) Prior to this commit, using a JSON Schema containing a string with `pattern` regular expression that uses top-level alternation (e.g. `"pattern": "^A\|B\|C\|D$"`) would result in invalid JSON output from the constrained sampling grammar, because it ended up creating a grammar rule like this for the string: ``` thing ::= "\"" "A" \| "B" \| "C" \| "D" "\"" space ``` Note that this rule will only match a starting quote for the "A" case, and will only match an ending quote for the "D" case, so this rule will always produce invalid JSON when used for sampling (that is, the JSON will always be lacking the starting quote, the ending quote, or both). This was fixed in a simple way by adding parentheses to the generated rule (for all string pattern rules, to keep it simple), such that the new generated rule looks like this (correct): ``` thing ::= "\"" ("A" \| "B" \| "C" \| "D") "\"" space ```	2024-10-16 19:03:24 +03:00
Molly Sophia	10433e8b45	llama : add tensor name for "result_norm" (#9907 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-10-16 13:10:21 +03:00
Alexey Parfenov	1f66b699c4	server : fix the disappearance of the end of the text (#9867 ) * server: fix the disappearance of the end of the text when streaming with stop strings * simplify "send text" checks	2024-10-16 11:35:53 +03:00

... 2 3 4 5 6 ...

4125 Commits