mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2025-02-02 23:23:11 +01:00
Add docs for performance optimizations
This commit is contained in:
parent
c6552785af
commit
c333e4c906
48
docs/Performance-optimizations.md
Normal file
48
docs/Performance-optimizations.md
Normal file
@ -0,0 +1,48 @@
|
|||||||
|
# Performance optimizations
|
||||||
|
|
||||||
|
In order to get the highest possible performance for your hardware, you can try compiling the following 3 backends manually instead of relying on the pre-compiled binaries that are part of `requirements.txt`:
|
||||||
|
|
||||||
|
* AutoGPTQ (the default GPTQ loader)
|
||||||
|
* GPTQ-for-LLaMa (secondary GPTQ loader)
|
||||||
|
* llama-cpp-python
|
||||||
|
|
||||||
|
If you go this route, you should update the Python requirements for the webui in the future with
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install -r requirements-minimal.txt --upgrade
|
||||||
|
```
|
||||||
|
|
||||||
|
and then install the up-to-date backends using the commands below. The file `requirements-minimal.txt` contains the all requirements except for the pre-compiled wheels for GPTQ and llama-cpp-python.
|
||||||
|
|
||||||
|
## AutoGPTQ
|
||||||
|
|
||||||
|
```
|
||||||
|
conda activate textgen
|
||||||
|
pip uninstall auto-gptq -i
|
||||||
|
git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
|
||||||
|
pip install .
|
||||||
|
```
|
||||||
|
|
||||||
|
## GPTQ-for-LLaMa
|
||||||
|
|
||||||
|
```
|
||||||
|
conda activate textgen
|
||||||
|
pip uninstall quant-cuda -y
|
||||||
|
cd text-generation-webui/repositories
|
||||||
|
rm -r GPTQ-for-LLaMa
|
||||||
|
git clone https://github.com/oobabooga/GPTQ-for-LLaMa
|
||||||
|
cd GPTQ-for-LLaMa
|
||||||
|
python setup_cuda.py install
|
||||||
|
```
|
||||||
|
|
||||||
|
## llama-cpp-python
|
||||||
|
|
||||||
|
If you do not have a GPU:
|
||||||
|
|
||||||
|
```
|
||||||
|
conda activate textgen
|
||||||
|
pip uninstall -y llama-cpp-python
|
||||||
|
pip install llama-cpp-python
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have a GPU, use the commands here instead: [llama.cpp-models.md#gpu-acceleration](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration)
|
@ -19,3 +19,4 @@
|
|||||||
* [WSL installation guide](WSL-installation-guide.md)
|
* [WSL installation guide](WSL-installation-guide.md)
|
||||||
* [Docker Compose](Docker.md)
|
* [Docker Compose](Docker.md)
|
||||||
* [Audio notification](Audio-Notification.md)
|
* [Audio notification](Audio-Notification.md)
|
||||||
|
* [Performance optimizations](Performance-optimizations.md)
|
||||||
|
22
requirements-minimal.txt
Normal file
22
requirements-minimal.txt
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
accelerate==0.20.3
|
||||||
|
colorama
|
||||||
|
datasets
|
||||||
|
einops
|
||||||
|
flexgen==0.1.7
|
||||||
|
gradio_client==0.2.5
|
||||||
|
gradio==3.33.1
|
||||||
|
markdown
|
||||||
|
numpy
|
||||||
|
pandas
|
||||||
|
Pillow>=9.5.0
|
||||||
|
pyyaml
|
||||||
|
requests
|
||||||
|
safetensors==0.3.1
|
||||||
|
sentencepiece
|
||||||
|
tqdm
|
||||||
|
scipy
|
||||||
|
transformers==4.30.0
|
||||||
|
git+https://github.com/huggingface/peft@e45529b149c7f91ec1d4d82a5a152ef56c56cb94
|
||||||
|
bitsandbytes==0.39.0; platform_system != "Windows"
|
||||||
|
https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.39.0-py3-none-any.whl; platform_system == "Windows"
|
||||||
|
llama-cpp-python==0.1.57; platform_system != "Windows"
|
Loading…
Reference in New Issue
Block a user