llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

vulkan: Optimize some mat-vec mul quant shaders (#10296 )

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

2024-11-16 07:26:57 +01:00

include

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 )

2024-11-15 01:28:50 +01:00

src

vulkan: Optimize some mat-vec mul quant shaders (#10296 )

2024-11-16 07:26:57 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 )

2024-11-15 01:28:50 +01:00