mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-14 14:28:58 +01:00
0d56246f4b
* ggml : group all experts in a single ggml_mul_mat_id cuda : improve mmid row copy * cuda : fix bin bcast with non-cont src0 * test-backend-ops : only run all mul mat tests for base types * llama : disable moe offloading with SYCL --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |
||
---|---|---|
.. | ||
.gitignore | ||
CMakeLists.txt | ||
get-model.cpp | ||
get-model.h | ||
run-json-schema-to-grammar.mjs | ||
test-autorelease.cpp | ||
test-backend-ops.cpp | ||
test-c.c | ||
test-chat-template.cpp | ||
test-double-float.cpp | ||
test-grad0.cpp | ||
test-grammar-integration.cpp | ||
test-grammar-parser.cpp | ||
test-json-schema-to-grammar.cpp | ||
test-llama-grammar.cpp | ||
test-model-load-cancel.cpp | ||
test-opt.cpp | ||
test-quantize-fns.cpp | ||
test-quantize-perf.cpp | ||
test-rope.cpp | ||
test-sampling.cpp | ||
test-tokenizer-0-falcon.cpp | ||
test-tokenizer-0-falcon.py | ||
test-tokenizer-0-llama.cpp | ||
test-tokenizer-0-llama.py | ||
test-tokenizer-1-bpe.cpp | ||
test-tokenizer-1-llama.cpp |