llama.cpp/ggml
Jeff Bolz 716bd6dec3
vulkan: optimize mul_mat for small values of N (#10991)
Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.
2024-12-30 18:27:11 +01:00
..
include tts : add OuteTTS support (#10784) 2024-12-18 19:27:21 +02:00
src vulkan: optimize mul_mat for small values of N (#10991) 2024-12-30 18:27:11 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : fix arm build (#10890) 2024-12-18 23:21:42 +01:00