Molly Sophia ee7136c6d1
llama: add support for QRWKV6 model architecture (#11001)
llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
..
2024-12-18 19:27:21 +02:00
2023-08-25 09:26:05 +03:00

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert_hf_to_gguf.py as an example for its usage.

Installation

pip install gguf

API Examples/Simple Tools

examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

gguf/scripts/gguf_dump.py — Dumps a GGUF file's metadata to the console.

gguf/scripts/gguf_set_metadata.py — Allows changing simple metadata values in a GGUF file by key.

gguf/scripts/gguf_convert_endian.py — Allows converting the endianness of GGUF files.

gguf/scripts/gguf_new_metadata.py — Copies a GGUF file with added/modified/removed metadata values.

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

  1. Bump the version in pyproject.toml.
  2. Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.
git tag -a gguf-v1.0.0 -m "Version 1.0 release"
  1. Push the tags.
git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, follow these steps to release a new version:

  1. Bump the version in pyproject.toml.
  2. Build the package:
python -m build
  1. Upload the generated distribution archives:
python -m twine upload dist/*

Run Unit Tests

From root of this repository you can run this command to run all the unit tests

python -m unittest discover ./gguf-py -v

TODO

  • Include conversion scripts as command line entry points in this package.