llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-27 04:23:06 +01:00

History

Jeff Bolz 772703c8ff vulkan: Optimize some mat-vec mul quant shaders (#10296 ) Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.		2024-11-16 07:26:57 +01:00
..
include	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 )	2024-11-15 01:28:50 +01:00
src	vulkan: Optimize some mat-vec mul quant shaders (#10296 )	2024-11-16 07:26:57 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 )	2024-11-15 01:28:50 +01:00