diff --git a/docs/GPTQ-models-(4-bit-mode).md b/docs/GPTQ-models-(4-bit-mode).md index 8eaf86ca..a576213f 100644 --- a/docs/GPTQ-models-(4-bit-mode).md +++ b/docs/GPTQ-models-(4-bit-mode).md @@ -6,7 +6,7 @@ AutoGPTQ is the recommended way to create new quantized models: https://github.c ### Installation -To load a model quantized with AutoGPTQ in the web UI, manual installation is currently necessary: +To load a model quantized with AutoGPTQ in the web UI, you need to first manually install the AutoGPTQ library: ``` conda activate textgen @@ -14,11 +14,11 @@ git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ pip install . ``` -You are going to need to have `nvcc` installed (see the [instructions below](https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md#step-0-install-nvcc)). +The last command requires `nvcc` to be installed (see the [instructions below](https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md#step-0-install-nvcc)). ### Usage -Place the output folder generated by AutoGPTQ in your `models/` folder and load it with the `--autogptq` flag: +When you quantize a model using AutoGPTQ, a folder containing a filed called `quantize_config.json` will be generated. Place that folder inside your `models/` folder and load it with the `--autogptq` flag: ``` python server.py --autogptq --model model_name