llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-02-09 09:48:16 +01:00

History

fairydreaming ee3dff6b8e Add support for DeepseekV2ForCausalLM (#7519 ) * common : increase max number of experts to 160 * common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture * common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier * convert-hf : add model conversion support for DeepseekV2ForCausalLM * llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models * llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor) * llama : add inference support for LLM_ARCH_DEEPSEEK2 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>		2024-05-28 17:07:05 +02:00
..
examples	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
gguf	Add support for DeepseekV2ForCausalLM (#7519 )	2024-05-28 17:07:05 +02:00
scripts	gguf-py : fix and simplify quantized shape round-trip (#7483 )	2024-05-25 11:11:48 +10:00
tests	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
LICENSE	gguf : make gguf pip-installable	2023-08-25 09:26:05 +03:00
pyproject.toml	convert-hf : save memory with lazy evaluation (#7075 )	2024-05-08 18:16:38 -04:00
README.md	convert : support models with multiple chat templates (#6588 )	2024-04-18 14:49:01 +03:00

README.md

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert-llama-hf-to-gguf.py as an example for its usage.

Installation

pip install gguf

API Examples/Simple Tools

examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

scripts/gguf-dump.py — Dumps a GGUF file's metadata to the console.

scripts/gguf-set-metadata.py — Allows changing simple metadata values in a GGUF file by key.

scripts/gguf-convert-endian.py — Allows converting the endianness of GGUF files.

scripts/gguf-new-metadata.py — Copies a GGUF file with added/modified/removed metadata values.

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

Bump the version in pyproject.toml.
Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.

git tag -a gguf-v1.0.0 -m "Version 1.0 release"

Push the tags.

git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, follow these steps to release a new version:

Bump the version in pyproject.toml.
Build the package:

python -m build

Upload the generated distribution archives:

python -m twine upload dist/*

TODO

Add tests
Include conversion scripts as command line entry points in this package.