llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-27 04:23:06 +01:00

History

0cc4m 3df784b305 Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597 ) * Vulkan: Implement VK_KHR_cooperative_matrix support in the matrix matrix multiplication shader * Improve performance with better q4_k and q5_k dequant and store unrolling * Add Vulkan MUL_MAT and MUL_MAT_ID accumulator precision selection * Rework mulmat shader selection and compilation logic, avoid compiling shaders that won't get used by device * Vulkan: Implement accumulator switch for specific mul mat mat shaders * Vulkan: Unroll more loops for more mul mat mat performance * Vulkan: Add VK_AMD_shader_core_properties2 support to read Compute Unit count for split_k logic * Disable coopmat support on AMD proprietary driver * Remove redundant checks * Add environment variable GGML_VK_DISABLE_COOPMAT to disable VK_KHR_cooperative_matrix support * Fix rebase typo * Fix coopmat2 MUL_MAT_ID pipeline selection		2024-12-07 10:24:15 +01:00
..
include	ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)	2024-12-05 13:27:31 +02:00
src	Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597 )	2024-12-07 10:24:15 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : add predefined list of CPU backend variants to build (#10626 )	2024-12-04 14:45:40 +01:00