diff --git a/README.md b/README.md index 438748a91..f029f06a8 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,39 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++ - [Roadmap May 2023](https://github.com/ggerganov/llama.cpp/discussions/1220) - [New quantization methods](https://github.com/ggerganov/llama.cpp#quantization) +
+ Table of Contents +
    +
  1. + Description +
  2. +
  3. + Usage + +
  4. +
  5. Contributing
  6. +
  7. Coding guidelines
  8. +
  9. Docs
  10. +
+
+ ## Description The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook @@ -46,6 +79,7 @@ as the main playground for developing new features for the [ggml](https://github - [X] [Vicuna](https://github.com/ggerganov/llama.cpp/discussions/643#discussioncomment-5533894) - [X] [Koala](https://bair.berkeley.edu/blog/2023/04/03/koala/) - [X] [OpenBuddy 🐶 (Multilingual)](https://github.com/OpenBuddy/OpenBuddy) +- [X] [Pygmalion 7B / Metharme 7B](#using-pygmalion-7b--metharme-7b) **Bindings:** @@ -383,6 +417,19 @@ python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin - The newer GPT4All-J model is not yet supported! +### Using Pygmalion 7B & Metharme 7B + +- Obtain the [LLaMA weights](#obtaining-the-facebook-llama-original-model-and-stanford-alpaca-model-data) +- Obtain the [Pygmalion 7B](https://huggingface.co/PygmalionAI/pygmalion-7b/) or [Metharme 7B](https://huggingface.co/PygmalionAI/metharme-7b) XOR encoded weights +- Convert the LLaMA model with [the latest HF convert script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) +- Merge the XOR files with the converted LLaMA weights by running the [xor_codec](https://huggingface.co/PygmalionAI/pygmalion-7b/blob/main/xor_codec.py) script +- Convert to `ggml` format using the `convert.py` script in this repo: +```bash +python3 convert.py pygmalion-7b/ --outtype q4_1 +``` +> The Pygmalion 7B & Metharme 7B weights are saved in [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) precision. If you wish to convert to `ggml` without quantizating, please specify the `--outtype` as `f32` instead of `f16`. + + ### Obtaining the Facebook LLaMA original model and Stanford Alpaca model data - **Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.**