mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-26 06:10:29 +01:00
37bef89433
* Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t |
||
---|---|---|
.. | ||
.gitignore | ||
CMakeLists.txt | ||
get-model.cpp | ||
get-model.h | ||
run-json-schema-to-grammar.mjs | ||
test-autorelease.cpp | ||
test-backend-ops.cpp | ||
test-c.c | ||
test-chat-template.cpp | ||
test-double-float.cpp | ||
test-grad0.cpp | ||
test-grammar-integration.cpp | ||
test-grammar-parser.cpp | ||
test-json-schema-to-grammar.cpp | ||
test-llama-grammar.cpp | ||
test-model-load-cancel.cpp | ||
test-opt.cpp | ||
test-quantize-fns.cpp | ||
test-quantize-perf.cpp | ||
test-rope.cpp | ||
test-sampling.cpp | ||
test-tokenizer-0.cpp | ||
test-tokenizer-0.py | ||
test-tokenizer-0.sh | ||
test-tokenizer-1-bpe.cpp | ||
test-tokenizer-1-spm.cpp | ||
test-tokenizer-random.py |