llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922 )

* Add nemotron GGUF conversion & inference support

* Fix formatting issues

* Remove unnecessary write_tensors()

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Address comments by @compilade

* Replace ggml_mul_mat()->llm_build_lora_mm()

* Remove mutable variable

* Use  for bias tensors

* Cover corner case for role_scaling not in config.json

---------

Co-authored-by: compilade <git@compilade.net>

2024-08-16 04:23:33 +02:00

__init__.py

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )

2024-07-18 20:40:15 +10:00

constants.py

Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922 )

2024-08-16 04:23:33 +02:00

gguf_reader.py

py : type-check all Python scripts with Pyright (#8341 )

2024-07-07 15:04:39 -04:00

gguf_writer.py

Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858 )