mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-25 13:58:46 +01:00
08a0c02060
* ggml : update mul_mat_id to use the same tensor for all the experts * update cuda * minor * update metal * update test-backend-ops * fix cuda * Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update convert.py * update convert-hf-to-gguf.py * update convert.py for mixtral hf models * Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * cuda : support non-pow-2 number of experts * allow quantize to work for split and merged experts models in the same way * cleanup + disable mmap automatically with split tensors models * update imatrix * test-backend-ops : test qwen argsort * update grok model loading * llama : add merged experts tensors to the grok tensor map * minor * gguf : bump version * fix quantizing of merged experts * convert-hf-to-gguf.py : update grok (untested) * make linter happy * cuda/argsort : use shared memory instead of pool memory * convert : fix grok tensor names * metal : add support for non-pow-2 argsort * llama : more loader cleanup, better error checking * cuda : fix warning * llama : still use mmap for loading old models, but copy the data to a host buffer * add review note * llama : remove ffn tensor counting + add sanity check ggml-ci * convert : fix handling of n_experts == None ggml-ci * imatrix : fix ncall counters * llama : produce error if imatrix size does not match * quantize : terminate on errors + trace logs ggml-ci * metal : pad shared memory to 16 bytes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |
||
---|---|---|
.. | ||
acc.cu | ||
acc.cuh | ||
alibi.cu | ||
alibi.cuh | ||
arange.cu | ||
arange.cuh | ||
argsort.cu | ||
argsort.cuh | ||
binbcast.cu | ||
binbcast.cuh | ||
clamp.cu | ||
clamp.cuh | ||
common.cuh | ||
concat.cu | ||
concat.cuh | ||
convert.cu | ||
convert.cuh | ||
cpy.cu | ||
cpy.cuh | ||
dequantize.cuh | ||
diagmask.cu | ||
diagmask.cuh | ||
dmmv.cu | ||
dmmv.cuh | ||
getrows.cu | ||
getrows.cuh | ||
im2col.cu | ||
im2col.cuh | ||
mmq.cu | ||
mmq.cuh | ||
mmvq.cu | ||
mmvq.cuh | ||
norm.cu | ||
norm.cuh | ||
pad.cu | ||
pad.cuh | ||
pool2d.cu | ||
pool2d.cuh | ||
quantize.cu | ||
quantize.cuh | ||
rope.cu | ||
rope.cuh | ||
scale.cu | ||
scale.cuh | ||
softmax.cu | ||
softmax.cuh | ||
sumrows.cu | ||
sumrows.cuh | ||
tsembd.cu | ||
tsembd.cuh | ||
unary.cu | ||
unary.cuh | ||
upscale.cu | ||
upscale.cuh | ||
vecdotq.cuh |