llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-02-04 07:33:54 +01:00

Author	SHA1	Message	Date
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
slaren	ae1f211ce2	cuda : refactor into multiple files (#6269 )	2024-03-25 13:50:23 +01:00
Cebtenzzre	00d62adb79	fix some warnings from gcc and clang-tidy (#3038 ) Co-authored-by: xaedes <xaedes@gmail.com>	2023-09-07 13:22:29 -04:00
Georgi Gerganov	53aba3f393	clang-tidy : restore dot file from accidental deletion	2023-06-08 10:09:08 +03:00
Kawrakow	4161bdc04d	metal : add Q4_K implementation (#1733 ) * Metal implementation for Q4_K Very slow for now: 42 ms / token, Q4_0 runs in 28 ms/token on my 30-core M2 Max GPU. * Optimizing Q4_K on metal The first token always takes longer, I guess because the metal kernel is being jit-compiled. So, using n = 128 to measure time. At this point Q4_K takes 29.5 ms / token compared to 27.2 ms / token for Q4_0. Quite a bit better than the initial attempt, but still not good enough. * Optimizing q4_K metal dot some more For n = 256 it is now 28.1 ms/token compared to 27 ms/token for q4_0. * Fix after merge with master --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-06-08 10:08:23 +03:00
slaren	553fd4d4b5	Add clang-tidy reviews to CI (#1407 )	2023-05-12 15:40:53 +02:00