mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2025-01-26 12:22:08 +01:00
49 lines
1.4 KiB
Markdown
49 lines
1.4 KiB
Markdown
|
# Performance optimizations
|
||
|
|
||
|
In order to get the highest possible performance for your hardware, you can try compiling the following 3 backends manually instead of relying on the pre-compiled binaries that are part of `requirements.txt`:
|
||
|
|
||
|
* AutoGPTQ (the default GPTQ loader)
|
||
|
* GPTQ-for-LLaMa (secondary GPTQ loader)
|
||
|
* llama-cpp-python
|
||
|
|
||
|
If you go this route, you should update the Python requirements for the webui in the future with
|
||
|
|
||
|
```
|
||
|
pip install -r requirements-minimal.txt --upgrade
|
||
|
```
|
||
|
|
||
|
and then install the up-to-date backends using the commands below. The file `requirements-minimal.txt` contains the all requirements except for the pre-compiled wheels for GPTQ and llama-cpp-python.
|
||
|
|
||
|
## AutoGPTQ
|
||
|
|
||
|
```
|
||
|
conda activate textgen
|
||
|
pip uninstall auto-gptq -i
|
||
|
git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
|
||
|
pip install .
|
||
|
```
|
||
|
|
||
|
## GPTQ-for-LLaMa
|
||
|
|
||
|
```
|
||
|
conda activate textgen
|
||
|
pip uninstall quant-cuda -y
|
||
|
cd text-generation-webui/repositories
|
||
|
rm -r GPTQ-for-LLaMa
|
||
|
git clone https://github.com/oobabooga/GPTQ-for-LLaMa
|
||
|
cd GPTQ-for-LLaMa
|
||
|
python setup_cuda.py install
|
||
|
```
|
||
|
|
||
|
## llama-cpp-python
|
||
|
|
||
|
If you do not have a GPU:
|
||
|
|
||
|
```
|
||
|
conda activate textgen
|
||
|
pip uninstall -y llama-cpp-python
|
||
|
pip install llama-cpp-python
|
||
|
```
|
||
|
|
||
|
If you have a GPU, use the commands here instead: [llama.cpp-models.md#gpu-acceleration](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration)
|