text-generation-webui/docs/LLaMA-model.md
zaypen 084b006cfe
Update LLaMA-model.md (#2460)
Better approach of converting LLaMA model
2023-06-07 15:34:50 -03:00

1.9 KiB

LLaMA is a Large Language Model developed by Meta AI.

It was trained on more tokens than previous models. The result is that the smallest version with 7 billion parameters has similar performance to GPT-3 with 175 billion parameters.

This guide will cover usage through the official transformers implementation. For 4-bit mode, head over to GPTQ models (4 bit mode) .

Getting the weights

Option 1: pre-converted weights

⚠️ The tokenizers for the Torrent source above and also for many LLaMA fine-tunes available on Hugging Face may be outdated, so I recommend downloading the following universal LLaMA tokenizer:

python download-model.py oobabooga/llama-tokenizer

Once downloaded, it will be automatically applied to every LlamaForCausalLM model that you try to load.

Option 2: convert the weights yourself

  1. Install the protobuf library:
pip install protobuf==3.20.1
  1. Use the script below to convert the model in .pth format that you, a fellow academic, downloaded using Meta's official link:

Convert LLaMA to HuggingFace format

If you have transformers installed in place

python -m transformers.models.llama.convert_llama_weights_to_hf --input_dir /path/to/LLaMA --model_size 7B --output_dir /tmp/outputs/llama-7b

Otherwise download script convert_llama_weights_to_hf.py

python convert_llama_weights_to_hf.py --input_dir /path/to/LLaMA --model_size 7B --output_dir /tmp/outputs/llama-7b
  1. Move the llama-7b folder inside your text-generation-webui/models folder.

Starting the web UI

python server.py --model llama-7b