haopeng
42ae10bbcd
add cmake rvv support ( #10411 )
2024-11-19 21:10:31 +01:00
FirstTimeEZ
a43178299c
ggml : fix undefined reference to 'getcpu' ( #10354 )
...
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-17 10:39:22 +02:00
Johannes Gäßler
8a43e940ab
ggml: new optimization interface (ggml/988)
2024-11-17 08:30:29 +02:00
Georgi Gerganov
db4cfd5dbc
llamafile : fix include path ( #0 )
...
ggml-ci
2024-11-16 20:36:26 +02:00
Dan Johansson
1e58ee1318
ggml : optimize Q4_0 into Q4_0_X_Y repack ( #10324 )
2024-11-16 01:53:37 +01:00
Srihari-mcw
74d73dc85c
Make updates to fix issues with clang-cl builds while using AVX512 flags ( #10314 )
2024-11-15 22:27:00 +01:00
Georgi Gerganov
09ecbcb596
cmake : fix ppc64 check (whisper/0)
...
ggml-ci
2024-11-15 15:44:06 +02:00
Eve
18429220bd
AVX BF16 and single scale quant optimizations ( #10212 )
...
* use 128 bit loads (i've tried 256->128 to death and its slower)
* double accumulator
* avx bf16 vec dot
* +3% q4_0 inference
* +7% tg +5% pp compared to master
* slower f16c version, kep for reference
* 256b version, also slow. i tried :)
* revert f16
* faster with madd
* split to functions
* Q8_0 and IQ4_NL, 5-7% faster
* fix potential overflow (performance reduced)
* 16 bit add for q4_0 only
* merge
2024-11-15 12:47:58 +01:00
Charles Xu
1607a5e5b0
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels ( #9921 )
...
* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-11-15 01:28:50 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00