llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-16 07:08:28 +01:00

Author	SHA1	Message	Date
haopeng	42ae10bbcd	add cmake rvv support (#10411 )	2024-11-19 21:10:31 +01:00
FirstTimeEZ	a43178299c	ggml : fix undefined reference to 'getcpu' (#10354 ) https://github.com/ggerganov/llama.cpp/issues/10352	2024-11-17 10:39:22 +02:00
Johannes Gäßler	8a43e940ab	ggml: new optimization interface (ggml/988)	2024-11-17 08:30:29 +02:00
Georgi Gerganov	db4cfd5dbc	llamafile : fix include path (#0 ) ggml-ci	2024-11-16 20:36:26 +02:00
Dan Johansson	1e58ee1318	ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324 )	2024-11-16 01:53:37 +01:00
Srihari-mcw	74d73dc85c	Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314 )	2024-11-15 22:27:00 +01:00
Georgi Gerganov	09ecbcb596	cmake : fix ppc64 check (whisper/0) ggml-ci	2024-11-15 15:44:06 +02:00
Eve	18429220bd	AVX BF16 and single scale quant optimizations (#10212 ) * use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge	2024-11-15 12:47:58 +01:00
Charles Xu	1607a5e5b0	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 ) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-11-15 01:28:50 +01:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00

10 Commits