From a1ca1c04a15d32e9d0c035077c0020858091573a Mon Sep 17 00:00:00 2001 From: Jonathan Yankovich Date: Fri, 16 Jun 2023 21:46:25 -0500 Subject: [PATCH] Update ExLlama.md (#2729) Add details for configuring exllama --- docs/ExLlama.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/ExLlama.md b/docs/ExLlama.md index a9fd016d..1a51f188 100644 --- a/docs/ExLlama.md +++ b/docs/ExLlama.md @@ -2,7 +2,7 @@ ## About -ExLlama is an extremely optimized GPTQ backend for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code. +ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code. ## Installation: @@ -15,3 +15,7 @@ git clone https://github.com/turboderp/exllama ``` 2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama + +3) Configure text-generation-webui to use exllama via the UI or command line: + - In the "Model" tab, set "Loader" to "exllama" + - Specify `--loader exllama` on the command line