2023-06-17 01:35:38 +02:00
# ExLlama
## About
2023-06-17 04:46:25 +02:00
ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.
2023-06-17 01:35:38 +02:00
2023-06-17 01:40:12 +02:00
## Installation:
2023-06-17 01:35:38 +02:00
2023-06-17 01:40:12 +02:00
1) Clone the ExLlama repository into your `text-generation-webui/repositories` folder:
2023-06-17 01:35:38 +02:00
```
2023-06-17 01:40:12 +02:00
mkdir repositories
2023-06-17 01:35:38 +02:00
cd repositories
git clone https://github.com/turboderp/exllama
```
2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
2023-06-17 04:46:25 +02:00
3) Configure text-generation-webui to use exllama via the UI or command line:
- In the "Model" tab, set "Loader" to "exllama"
- Specify `--loader exllama` on the command line