llama.cpp/ggml
Jeff Bolz 772703c8ff
vulkan: Optimize some mat-vec mul quant shaders (#10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
2024-11-16 07:26:57 +01:00
..
include backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) 2024-11-15 01:28:50 +01:00
src vulkan: Optimize some mat-vec mul quant shaders (#10296) 2024-11-16 07:26:57 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) 2024-11-15 01:28:50 +01:00