mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-09 03:58:57 +01:00
716bd6dec3
Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base. |
||
---|---|---|
.. | ||
include | ||
src | ||
.gitignore | ||
CMakeLists.txt |