llama.cpp/ggml
Jeff Bolz af148c9386
vulkan: Optimize binary ops (#10270)
Reuse the index calculations across all of src0/src1/dst. Add a shader
variant for when src0/src1 are the same dimensions and additional modulus
for src1 aren't needed. Div/mod are slow, so add "fast" div/mod that
have a fast path when the calculation isn't needed or can be done more
cheaply.
2024-11-14 06:22:55 +01:00
..
cmake llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
include metal : optimize FA kernels (#10171) 2024-11-08 13:47:22 +02:00
src vulkan: Optimize binary ops (#10270) 2024-11-14 06:22:55 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt metal : opt-in compile flag for BF16 (#10218) 2024-11-08 21:59:46 +02:00