llama.cpp/ggml
Johannes Gäßler 808aba3916
CUDA: optimize and refactor MMQ (#8416)
* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
2024-07-11 16:47:47 +02:00
..
cmake llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
include ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) 2024-07-10 15:14:51 +03:00
src CUDA: optimize and refactor MMQ (#8416) 2024-07-11 16:47:47 +02:00
CMakeLists.txt ggml : move sgemm sources to llamafile subfolder (#8394) 2024-07-10 15:23:29 +03:00
ggml_vk_generate_shaders.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00