llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 05:17:21 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	cce3dcffc5	cuda : non-cont concat support (#7610 ) * tests : add non-cont concat tests * cuda : non-cont concat support ggml-ci b3037	2024-05-29 15:38:26 +03:00
Radoslav Gerganov	210d99173d	llama-bench : add support for the RPC backend (#7435 ) b3036	2024-05-29 14:45:44 +03:00
slaren	87bdf2a199	ggml : use atomic_flag for critical section (#7598 ) * ggml : use atomic_flag for critical section * add windows shims b3035	2024-05-29 13:36:39 +02:00
Georgi Gerganov	00281b7be3	scripts : remove mpi remnants	2024-05-29 14:31:18 +03:00
Georgi Gerganov	2ab977282b	sync : ggml b3033	2024-05-29 14:29:52 +03:00
Georgi Gerganov	72de268bec	ggml : restore ggml_rope_xpos_inplace (ggml/0) ggml-ci	2024-05-29 14:29:33 +03:00
Akarshan Biswas	0e8d8bfd6c	Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605 )	2024-05-29 16:53:47 +10:00
zhouwg	504f0c340f	ggml : fix typo in ggml.c (#7603 ) b3030	2024-05-29 04:09:31 +02:00
Meng, Hengyu	b864b50ce5	[SYCL] Align GEMM dispatch (#7566 ) * align GEMM dispatch b3029	2024-05-29 07:00:24 +08:00
caitianchi	c38d152d7d	fix warnings	2024-05-29 04:35:08 +08:00
caitianchi	07f48f9669	fix warnings	2024-05-29 04:09:44 +08:00
jaime-m-p	02c1ecad07	Tokenizer WPM fixes (#7500 ) * Update random test: add_bos_token. * Update random test: add WPM models for testing. * Build vocab.special_tokens_cache using vocab token types. * Fix and improve WPM preprocessing. - Fix unicode edge case combinations. - Split by whitspace in the same pass. * Discard all tokens when no matching found. b3028	2024-05-28 21:46:34 +02:00
Georgi Gerganov	6bd12ce409	sycl : fix assert (#7563 ) b3027	2024-05-28 22:22:50 +03:00
caitianchi	02eb445d73	sync master	2024-05-29 03:06:58 +08:00
tc-mb	28d4a7f9cc	Merge pull request #8 from OpenBMB/master sync master	2024-05-29 03:03:26 +08:00
tc-mb	8bd47ce5d6	Merge pull request #7 from OpenBMB/prepare-PR sync master	2024-05-29 02:50:30 +08:00
tc-mb	8767ce29cf	Merge branch 'prepare-PR-of-minicpm-v2.5' into prepare-PR	2024-05-29 02:49:59 +08:00
Giuseppe Scrivano	5442939fcc	llama : support small Granite models (#7481 ) * Add optional MLP bias for Granite models Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to properly support Granite. * llama: honor add_space_prefix from the model configuration propagate the add_space_prefix configuration from the HF model configuration to the gguf file and honor it with the gpt2 tokenizer. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * llama: add support for small granite models it works only for the small models 3b and 8b. The convert-hf-to-gguf.py script uses the vocabulary size of the granite models to detect granite and set the correct configuration. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Steffen Roecker <sroecker@redhat.com> b3026	2024-05-28 21:49:49 +03:00
caitianchi	b37ab0b1e5	add link	2024-05-29 02:21:41 +08:00
caitianchi	9495504e7b	replace and organize code	2024-05-29 01:52:26 +08:00
caitianchi	3c306f18c8	clear code	2024-05-29 01:50:59 +08:00
k.h.lai	56411a950f	vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552 ) b3025	2024-05-28 19:25:08 +02:00
caitianchi	056d178160	rename wrapper	2024-05-29 00:18:17 +08:00
Radoslav Gerganov	2b737caae1	rpc : resource management rework (#7562 ) * rpc : resource management rework * address review comments b3024	2024-05-28 18:13:36 +03:00
fairydreaming	ee3dff6b8e	Add support for DeepseekV2ForCausalLM (#7519 ) * common : increase max number of experts to 160 * common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture * common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier * convert-hf : add model conversion support for DeepseekV2ForCausalLM * llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models * llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor) * llama : add inference support for LLM_ARCH_DEEPSEEK2 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> b3023	2024-05-28 17:07:05 +02:00
Georgi Gerganov	edc29433fa	tests : fix test-tokenizer-0.sh	2024-05-28 15:04:09 +03:00
Georgi Gerganov	8b99e2aa66	llama : handle unknown utf8 bytes (#7588 ) b3021	2024-05-28 13:55:35 +03:00
Brian	271ff3fc44	github: add refactor to issue template (#7561 ) * github: add refactor issue template [no ci] * Update 07-refactor.yml	2024-05-28 20:27:27 +10:00
Neo Zhang	e2b065071c	[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436 ) * fix mul_mat_id to match the change of api * rm comment * rm unused or duplicated code, rename as review comment b3019	2024-05-28 10:53:37 +01:00
caitianchi	6366d62d6b	updata cmakelist	2024-05-28 16:35:13 +08:00
Georgi Gerganov	0548a4187f	ggml : generalize GGML_OP_CONCAT (#7563 ) * ggml : generalize GGML_OP_CONCAT (WIP) ggml-ci * tests : add dim != 2 tests * metal : generalize concat kernel * tests : naming * cuda : generalize concat kernel ggml-ci * sycl : add warning and assert * ggml : fix op params handling * metal : bugfix kernel ggml-ci * ggml : reimplement CPU and Metal * cuda : add asserts ggml-ci * ggml : fix ptrs ggml-ci b3018	2024-05-28 11:04:19 +03:00
caitianchi	e73a0c7c2f	updata cmakelist	2024-05-28 15:26:09 +08:00
mgroeber9110	9335b969e8	server: do not remove whitespace at the start of a completion chunk (#7524 )	2024-05-28 14:55:51 +10:00
Nathan Epstein	c41767154e	Markdownish code block fix (#7571 ) * markdownish codeblock fix * updating regexes	2024-05-28 14:41:14 +10:00
Ikko Eltociear Ashimine	74b239b3d5	llava : update clip.h (#7580 ) overriden -> overridden b3015	2024-05-28 12:48:16 +10:00
Djip007	852aafb163	update HIP_UMA #7399 (#7414 ) * update HIP_UMA #7399 add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable. - get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103) * simplify code, more consistent style --------- Co-authored-by: slaren <slarengh@gmail.com> b3014	2024-05-28 01:40:47 +02:00
kunnis	0136966daf	adding in x64 targets to cmake presets (#7574 )	2024-05-28 01:40:12 +02:00
Johannes Gäßler	10b1e45876	make: add --device-debug to NVCC debug flags (#7542 ) b3012	2024-05-27 19:34:40 +02:00
agray3	197c00681b	Allow multiple copy function pointers for CUDA graph kernel param updates (#7565 ) CUDA graphs require parameter updates to kernels associated with GGML_OP_CPY nodes. Previously the implementation only checked for a single CUDA kernel in such nodes, but this caused a bug in cases where 2 such kernels exist. This fixes the issue by using a vector to allow multiple function pointers to be stored and checked against. Fixes #7942 b3011	2024-05-27 19:33:42 +02:00
caitianchi	d8974b8ea6	support ollama	2024-05-28 01:13:57 +08:00
AidanBeltonS	95f84d5ce8	Fix q_xxs using mul_mat_q (#7459 ) b3010	2024-05-27 22:04:51 +05:30
AidanBeltonS	5487593bc7	Add freq factors (#7495 )	2024-05-27 18:04:09 +05:30
Georgi Gerganov	1d8fca72ae	metal : add GGML_OP_REPEAT kernels (#7557 ) ggml-ci b3008	2024-05-27 12:10:19 +03:00
Georgi Gerganov	62bfef5194	metal : disable FA kernel for HS=256 (#7556 ) ggml-ci b3007	2024-05-27 10:38:39 +03:00
Georgi Gerganov	eaf6e03174	llama : add comments about experimental flags (#7544 ) b3006	2024-05-27 09:24:13 +03:00
Brian	d6ef0e77dd	github: add self sorted issue ticket forms (#7543 ) * github: add self sorted issue ticket forms [no ci] * github: consolidate BSD in bug issue ticket * github: remove contact from bug ticket template [no ci] * github: remove bios from os dropdown in bug report [no ci]	2024-05-27 10:54:30 +10:00
caitianchi	8541e99629	better pos_embed in clip	2024-05-27 04:27:54 +08:00
caitianchi	2997a680d2	change for ollama	2024-05-27 03:42:56 +08:00
caitianchi	18fe620976	change for ollama	2024-05-27 03:29:55 +08:00
caitianchi	d9fbc1d1c5	add positions index	2024-05-27 03:18:35 +08:00

... 3 4 5 6 7 ...

3265 Commits