llama.cpp/src
Yoshi Suhara 2a24c8caa6
Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922)
* Add nemotron GGUF conversion & inference support

* Fix formatting issues

* Remove unnecessary write_tensors()

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Address comments by @compilade

* Replace ggml_mul_mat()->llm_build_lora_mm()

* Remove mutable variable

* Use  for bias tensors

* Cover corner case for role_scaling not in config.json

---------

Co-authored-by: compilade <git@compilade.net>
2024-08-16 04:23:33 +02:00
..
CMakeLists.txt llama : move vocab, grammar and sampling into separate files (#8508) 2024-07-23 13:10:17 +03:00
llama-grammar.cpp ggml : reduce hash table reset cost (#8698) 2024-07-27 04:41:55 +02:00
llama-grammar.h llama : fix build + fix fabs compile warnings (#8683) 2024-07-25 19:57:31 +03:00
llama-impl.h llama : better replace_all (cont) (#8926) 2024-08-09 18:23:52 +03:00
llama-sampling.cpp Fix a spelling mistake (#9001) 2024-08-12 11:46:03 +02:00
llama-sampling.h llama : move vocab, grammar and sampling into separate files (#8508) 2024-07-23 13:10:17 +03:00
llama-vocab.cpp common : remove duplicate function llama_should_add_bos_token (#8778) 2024-08-15 10:23:23 +03:00
llama-vocab.h common : remove duplicate function llama_should_add_bos_token (#8778) 2024-08-15 10:23:23 +03:00
llama.cpp Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922) 2024-08-16 04:23:33 +02:00
unicode-data.cpp Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
unicode-data.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
unicode.cpp llama : move vocab, grammar and sampling into separate files (#8508) 2024-07-23 13:10:17 +03:00
unicode.h llama : move vocab, grammar and sampling into separate files (#8508) 2024-07-23 13:10:17 +03:00