llama.cpp/src
Michael Coppola 940362224d
llama : add support for Tekken pre-tokenizer (#8579)
* llama : Added support for Tekken pre-tokenizer (#8577)

Removed uneeded `vocab.tokenizer_clean_spaces` assignment

* llama : fix order of pre-tokenizers

* * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces
* Updated chkhsh for Tekken tokenizer

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-20 16:43:51 +03:00
..
CMakeLists.txt tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231) 2024-07-04 13:53:42 +03:00
llama.cpp llama : add support for Tekken pre-tokenizer (#8579) 2024-07-20 16:43:51 +03:00
unicode-data.cpp Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
unicode-data.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
unicode.cpp msvc : silence codecvt c++17 deprecation warnings (#8395) 2024-07-10 14:40:53 +03:00
unicode.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00