llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

gguf-py : simplify support for quant types (#8838 )

* gguf-py : use classes for quants

* convert_hf : simplify internal quantization type selection

* gguf-py : fix flake8 lint

* gguf-py : fix BF16 numpy view type

* gguf-py : remove LlamaFileTypeMap

Too specific to 'llama.cpp', and would be a maintenance burden
to keep up to date.

* gguf-py : add generic quantize and dequantize functions

The quant classes no longer need to be known,
only the target or the source type,
for 'quantize' and 'dequantize', respectively.

2024-08-08 13:33:09 -04:00

__init__.py

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )

2024-07-18 20:40:15 +10:00

constants.py

gguf-py : simplify support for quant types (#8838 )

2024-08-08 13:33:09 -04:00

gguf_reader.py

py : type-check all Python scripts with Pyright (#8341 )

2024-07-07 15:04:39 -04:00

gguf_writer.py

Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858 )