llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-01 07:30:17 +01:00

Author	SHA1	Message	Date
Johannes Gäßler	7d1a378b8f	CUDA: refactor mmq, dmmv, mmvq (#7716 ) * CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits	2024-06-05 16:53:00 +02:00
Georgi Gerganov	e84b71c2c6	ggml : drop support for QK_K=64 (#7473 ) * ggml : drop support for QK_K=64 ggml-ci * opencl : restore QK_K=256 define	2024-05-23 10:00:21 +03:00
Carolinabanana	5dc9dd7152	llama : add Command R Plus support (#6491 ) * Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 11:16:13 +03:00
Georgi Gerganov	d48ccf3ad4	sync : ggml (#6351 ) * sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-29 17:45:46 +02:00
slaren	2f34b865b6	cuda : fix LLAMA_CUDA_F16 build (#6298 )	2024-03-25 16:43:22 +02:00
slaren	ae1f211ce2	cuda : refactor into multiple files (#6269 )	2024-03-25 13:50:23 +01:00

6 Commits