llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-26 14:20:31 +01:00

History

Eve 18429220bd AVX BF16 and single scale quant optimizations (#10212 ) * use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge		2024-11-15 12:47:58 +01:00
..
include	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 )	2024-11-15 01:28:50 +01:00
src	AVX BF16 and single scale quant optimizations (#10212 )	2024-11-15 12:47:58 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 )	2024-11-15 01:28:50 +01:00