* add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix lint error Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * optimize convert-hf-to-gguf.py for chatglm model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * support glm-4-9b-chat Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix eos tokens to glm4 * remove unused log * add preprocess to chatglm3 and chatglm4 * add eos_id_list to llama.cpp * fix code style * fix code style * fix conflicts * fix conflicts * Revert "add eos_id_list to llama.cpp" This reverts commit 3a4d5790bfdc205c5b658204239f168fc21cc1a8. * set <|endoftext|> as eos and <|user|> as eot * fix chat template bug * add comment to glm prefix and suffix * fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration * fix chat template bug * fix codestyle * fix conflicts * modified the general name of glm model * fix conflicts * remove prefix and suffix * use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3 * fix: resolve Flake8 errors in `convert-hf-to-gguf.py` - Fix E302 by adding two blank lines before top-level function definitions - Replace print statements to fix NP100 - Fix E303 by ensuring only one blank line between lines of code * fix rope ratio to solve incorrect answers * fix by comments --------- Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: Umpire2018 <138990495+Umpire2018@users.noreply.github.com>
gguf
This is a Python package for writing binary files in the GGUF (GGML Universal File) format.
See convert_hf_to_gguf.py as an example for its usage.
Installation
pip install gguf
API Examples/Simple Tools
examples/writer.py — Generates example.gguf
in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.
scripts/gguf_dump.py — Dumps a GGUF file's metadata to the console.
scripts/gguf_set_metadata.py — Allows changing simple metadata values in a GGUF file by key.
scripts/gguf_convert_endian.py — Allows converting the endianness of GGUF files.
scripts/gguf_new_metadata.py — Copies a GGUF file with added/modified/removed metadata values.
Development
Maintainers who participate in development of this package are advised to install it in editable mode:
cd /path/to/llama.cpp/gguf-py
pip install --editable .
Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py
.
In this case, upgrade Pip to the latest:
pip install --upgrade pip
Automatic publishing with CI
There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.
- Bump the version in
pyproject.toml
. - Create a tag named
gguf-vx.x.x
wherex.x.x
is the semantic version number.
git tag -a gguf-v1.0.0 -m "Version 1.0 release"
- Push the tags.
git push origin --tags
Manual publishing
If you want to publish the package manually for any reason, you need to have twine
and build
installed:
pip install build twine
Then, follow these steps to release a new version:
- Bump the version in
pyproject.toml
. - Build the package:
python -m build
- Upload the generated distribution archives:
python -m twine upload dist/*
TODO
- Add tests
- Include conversion scripts as command line entry points in this package.