Update ExLlama.md (#2729)

Add details for configuring exllama
2024-11-25 17:29:22 +01:00 · 2023-06-16 21:46:25 -05:00 · 2023-06-16 21:46:25 -05:00 · a1ca1c04a1
commit a1ca1c04a1
parent b27f83c0e9
1 changed files with 5 additions and 1 deletions
--- a/docs/ExLlama.md
+++ b/docs/ExLlama.md
@ -2,7 +2,7 @@
 ## About
-ExLlama is an extremely optimized GPTQ backend for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.
+ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.
 ## Installation:
@ -15,3 +15,7 @@ git clone https://github.com/turboderp/exllama
 ```
 2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
 3) Configure text-generation-webui to use exllama via the UI or command line:
   - In the "Model" tab, set "Loader" to "exllama"
   - Specify `--loader exllama` on the command line