llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-27 12:33:06 +01:00

History

Jeff Bolz af148c9386 vulkan: Optimize binary ops (#10270 ) Reuse the index calculations across all of src0/src1/dst. Add a shader variant for when src0/src1 are the same dimensions and additional modulus for src1 aren't needed. Div/mod are slow, so add "fast" div/mod that have a fast path when the calculation isn't needed or can be done more cheaply.		2024-11-14 06:22:55 +01:00
..
cmake	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
include	metal : optimize FA kernels (#10171 )	2024-11-08 13:47:22 +02:00
src	vulkan: Optimize binary ops (#10270 )	2024-11-14 06:22:55 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	metal : opt-in compile flag for BF16 (#10218 )	2024-11-08 21:59:46 +02:00