llama.cpp/ggml/src
Eve 18429220bd
AVX BF16 and single scale quant optimizations (#10212)
* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge
2024-11-15 12:47:58 +01:00
..
ggml-amx ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-blas ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-cann ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-cpu AVX BF16 and single scale quant optimizations (#10212) 2024-11-15 12:47:58 +01:00
ggml-cuda ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-hip ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-kompute ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-metal ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-musa ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-rpc ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-sycl sycl: Use syclcompat::dp4a (#10267) 2024-11-15 11:09:12 +08:00
ggml-vulkan ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
CMakeLists.txt ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-aarch64.c ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-aarch64.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-alloc.c ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) 2024-10-16 11:28:01 +03:00
ggml-backend-impl.h llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
ggml-backend-reg.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-backend.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-common.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) 2024-09-05 21:48:47 -04:00
ggml-impl.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-quants.c ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml.c ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00