llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 06:39:25 +01:00

Author	SHA1	Message	Date
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00