mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-10-30 06:30:15 +01:00
b9e74f9bca
* phi2 implementation * fix breaking change * phi-2 : various fixes * phi-2 : use layer norm eps * py : whitespaces * llama : fix meta KV override bug * convert : phi don't add BOS token * convert : revert "added_tokens_decoder" change * phi-2 : scale Q instead of KQ for better precision * ggml : fix NeoX rope to rotate just first n_dims * cuda : less diff in the rope_neox kernel * ggml : add ggml_mul_mat_set_prec ggml-ci * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * cuda : ggml_cuda_op_mul_mat_cublas support F32 precision * cuda : remove oboslete comment --------- Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com> |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
test-backend-ops.cpp | ||
test-c.c | ||
test-double-float.cpp | ||
test-grad0.cpp | ||
test-grammar-parser.cpp | ||
test-llama-grammar.cpp | ||
test-opt.cpp | ||
test-quantize-fns.cpp | ||
test-quantize-perf.cpp | ||
test-rope.cpp | ||
test-sampling.cpp | ||
test-tokenizer-0-falcon.cpp | ||
test-tokenizer-0-falcon.py | ||
test-tokenizer-0-llama.cpp | ||
test-tokenizer-0-llama.py | ||
test-tokenizer-1-bpe.cpp | ||
test-tokenizer-1-llama.cpp |