llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-02-04 15:43:53 +01:00

History

Johannes Gäßler 808aba3916 CUDA: optimize and refactor MMQ (#8416 ) * CUDA: optimize and refactor MMQ * explicit q8_1 memory layouts, add documentation		2024-07-11 16:47:47 +02:00
..
cmake	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
include	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780 )	2024-07-10 15:14:51 +03:00
src	CUDA: optimize and refactor MMQ (#8416 )	2024-07-11 16:47:47 +02:00
CMakeLists.txt	ggml : move sgemm sources to llamafile subfolder (#8394 )	2024-07-10 15:23:29 +03:00
ggml_vk_generate_shaders.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00