diff --git a/docs/ExLlama.md b/docs/ExLlama.md index a9fd016d..1a51f188 100644 --- a/docs/ExLlama.md +++ b/docs/ExLlama.md @@ -2,7 +2,7 @@ ## About -ExLlama is an extremely optimized GPTQ backend for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code. +ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code. ## Installation: @@ -15,3 +15,7 @@ git clone https://github.com/turboderp/exllama ``` 2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama + +3) Configure text-generation-webui to use exllama via the UI or command line: + - In the "Model" tab, set "Loader" to "exllama" + - Specify `--loader exllama` on the command line