llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-30 05:43:03 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	841f27abdb	metal : optimize FA kernels (#10171 ) * ggml : add ggml_flash_attn_ext_get_prec * metal : use F16 precision in FA kernels ggml-ci * metal : minor clean-up * metal : compile-guard bf16 FA kernels ggml-ci * build : remove obsolete compile flag [no ci] * metal : prevent int overflows [no ci] * cuda : disable BF16 FA ggml-ci * metal : fix BF16 requirement for FA kernels ggml-ci * make : clean-up [no ci]	2024-11-08 13:47:22 +02:00
snadampal	2319126a70	fix q4_0_8_8 format for corrupted tokens issue (#10198 ) Co-authored-by: EC2 Default User <ec2-user@ip-172-31-62-167.us-west-2.compute.internal>	2024-11-07 09:02:08 +01:00
Zhiyuan Li	3bcd40b3c5	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133 ) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-11-07 15:19:10 +08:00
Georgi Gerganov	5c333e0140	metal : add BF16 support (#8439 ) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support	2024-11-06 19:53:51 +02:00
Diego Devesa	94d8cb8be1	metal : fix from ptr buffer name (#10189 )	2024-11-06 12:10:07 +01:00
Georgi Gerganov	1dc04b2dee	ggml : adjust is_first_call init value (#10193 ) ggml-ci	2024-11-06 11:20:10 +02:00
Georgi Gerganov	a1eaf6a960	metal : add quantized FA support (#10149 ) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci]	2024-11-06 10:24:23 +02:00
Diego Devesa	a9e8a9a030	ggml : fix arch check in bf16_to_fp32 (#10164 )	2024-11-04 23:17:01 +01:00
Eve	3407364776	Q6_K AVX improvements (#10118 ) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86	2024-11-04 23:06:31 +01:00
Diego Devesa	d5a409e57f	ggml : fix gelu tables initialization (#10172 )	2024-11-04 20:06:58 +01:00
Diego Devesa	401558b7ba	ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167 )	2024-11-04 17:34:08 +01:00
snadampal	6a066b9978	fix build break on arm64 linux (#10166 ) This fixes the build break from the recent changes to move the CPU backend to separate files https://github.com/ggerganov/llama.cpp/pull/10144	2024-11-04 16:08:33 +01:00
Diego Devesa	ea02c753eb	cuda : clear error after changing peer access (#10153 )	2024-11-04 13:10:23 +01:00
Georgi Gerganov	05697f670b	metal : simplify f16 and f32 dequant kernels (#0 )	2024-11-04 13:49:34 +02:00
Georgi Gerganov	f8e58135cf	metal : move dequantize templates to beginning of MSL source (#0 )	2024-11-04 13:44:06 +02:00
leo-pony	329ed914c9	CANN: adjust backend registry refactor. (#10158 ) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.	2024-11-04 19:08:22 +08:00
Yuri Khrustalev	284e5b0275	cmake : make it possible linking ggml as external lib (ggml/1003)	2024-11-04 10:33:11 +02:00
Plamen Minev	e2292aaa17	metal : fix minor string leaks (ggml/1004)	2024-11-04 10:33:10 +02:00
Diego Devesa	9f40989351	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
Georgi Gerganov	08828a6d7d	metal : minor fixup in FA kernel (#10143 ) * metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var	2024-11-03 15:18:40 +02:00
Diego Devesa	e991e3127f	llama : use smart pointers for ggml resources (#10117 )	2024-11-01 23:48:26 +01:00
Shupei Fan	418f5eef26	vulkan : improve ggml_vk_create_buffer error handling (#9898 )	2024-11-01 19:33:14 +01:00
Georgi Gerganov	1804adb0cf	ggml : remove ggml_scratch (#10121 ) ggml-ci	2024-11-01 12:58:45 +02:00
Georgi Gerganov	f221d56220	ggml : alloc ggml_contexts on the heap (whisper/2525)	2024-11-01 10:24:50 +02:00
Zhenwei Jin	e597e50794	build: fix build error in Windows env with OneAPI setup (#10107 )	2024-11-01 11:09:59 +08:00
Diego Devesa	c02e5ab2a6	llama : fix buffer checks for mamba and rwk (#10111 ) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-10-31 22:54:23 +01:00
Diego Devesa	dea5e86051	ggml : check tensor name lengths in gguf files (#10100 )	2024-10-31 11:40:59 +01:00
Sergio López	1329c0a75e	kompute: add mul_mat_q4_k shader (#10097 ) This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-10-31 11:09:52 +02:00
Sergio López	61408e7fad	kompute: add backend registry / device interfaces (#10045 ) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-10-30 17:01:52 +01:00
Diego Devesa	b9e02e8184	ggml : fix memory leaks when loading invalid gguf files (#10094 ) * ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file	2024-10-30 14:51:21 +01:00
xctan	fc83a9e584	ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029 ) * ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM	2024-10-30 09:00:40 +02:00
Diego Devesa	c5b0f4b5d9	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
Changyeon Kim	8f275a7c45	ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763 ) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> --------- Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>	2024-10-29 09:52:56 +01:00
R0CKSTAR	524afeec9d	musa: workaround for Guilty Lockup in cleaning src0 (#10042 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-10-28 10:02:48 +01:00
bssrdf	8c60a8a462	increase cuda_cpy block size (ggml/996) Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-10-26 10:33:56 +03:00
Georgi Gerganov	668750357e	metal : support permuted matrix multiplicaions (#10033 ) * metal : support permuted matrix multiplicaions ggml-ci * cont : use nb01 directly for row steps ggml-ci * cont : add comments [no ci] * metal : minor refactor * metal : minor	2024-10-25 22:26:15 +03:00
Srihari-mcw	2f8bd2b901	llamafile : extend sgemm.cpp support for Q5_0 models (#10010 )	2024-10-25 10:27:41 +03:00
Johannes Gäßler	167a515651	CUDA: fix insufficient buffer clearing for MMQ (#10032 )	2024-10-24 14:40:23 +02:00
Johannes Gäßler	c39665f589	CUDA: fix MMQ for non-contiguous src0, add tests (#10021 ) * CUDA: fix MMQ for non-contiguous src0, add tests * revise test code	2024-10-24 11:09:36 +02:00
Johannes Gäßler	80273a306d	CUDA: fix 1D im2col, add tests (ggml/993)	2024-10-23 16:50:02 +03:00
Daniel Bevenius	c19af0acb1	ggml : remove redundant set of contexts used field (ggml/978) This commit removes the setting of the `used` field of the contexts in the global state (g_state) in `ggml_init`. The motivation for this change is that I believe that this additional initialization might not be required after the changes in Commit 45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from whisper.cpp"), which changed the initialization of the contexts field from `{ 0 }` to `{ { 0 } }`: ```console g_state = (struct ggml_state) { - /.contexts =/ { 0 }, + /.contexts =/ { { 0 } }, }; ``` My understanding is that the `{0}` initialization might not have zero-initialized all the nested fields in every array element because of compiler differences, and might have been the reason for having the explicit setting of the `used` fields to false.	2024-10-23 16:50:02 +03:00
Jun Hee Yoo	4c9388fb96	metal : add POOL2D and fix IM2COL (#9943 ) * add pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix im2col and add unittest for N>=1024 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * add tests for N % 1024 != 0 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * remove trailing whitespaces Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply suggestions Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply more optimization - original IM2COL kernel + _ext with MIN() Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review: change kernel name of pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix more formatting and enhance readability Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> --------- Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>	2024-10-23 13:33:45 +03:00
leo-pony	6b8447352d	[CANN] Adapt to dynamically loadable backends mechanism (#9970 ) * [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request	2024-10-22 16:16:01 +08:00
Georgi Gerganov	f594bc80ba	ggml : add asserts for type conversion in fattn kernels (#9971 ) ggml-ci	2024-10-21 16:20:46 +03:00
Radoslav Gerganov	d5ebd79c76	rpc : pack only RPC structs (#9959 )	2024-10-21 13:35:40 +03:00
Neo Zhang Jianyu	1db8c84fc6	fix mul_mat_vec_q and *_vec_q error (#9939 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-10-21 14:26:09 +08:00
Radoslav Gerganov	afd9909a64	rpc : backend refactoring (#9912 ) * rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server	2024-10-18 14:33:58 +03:00
Ouadie EL FAROUKI	87421a23e8	[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705 ) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-10-18 06:46:16 +01:00
Ma Mingfei	60ce97c9d8	add amx kernel for gemm (#8998 ) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-10-18 13:34:36 +08:00
Diego Devesa	f010b77a37	vulkan : add backend registry / device interfaces (#9721 ) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-10-17 02:46:58 +02:00

1 2 3 4 5 ...

275 Commits