From aff3e04df41281c0a9107ff8cf142160eedf50d1 Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri, 9 Jun 2023 21:15:37 -0300 Subject: [PATCH] Remove irrelevant docs Compiling from source, in my tests, makes no difference in the resulting tokens/s. --- docs/Performance-optimizations.md | 48 ------------------------------- docs/README.md | 1 - requirements-minimal.txt | 21 -------------- 3 files changed, 70 deletions(-) delete mode 100644 docs/Performance-optimizations.md delete mode 100644 requirements-minimal.txt diff --git a/docs/Performance-optimizations.md b/docs/Performance-optimizations.md deleted file mode 100644 index c5fae54e..00000000 --- a/docs/Performance-optimizations.md +++ /dev/null @@ -1,48 +0,0 @@ -# Performance optimizations - -In order to get the highest possible performance for your hardware, you can try compiling the following 3 backends manually instead of relying on the pre-compiled binaries that are part of `requirements.txt`: - -* AutoGPTQ (the default GPTQ loader) -* GPTQ-for-LLaMa (secondary GPTQ loader) -* llama-cpp-python - -If you go this route, you should update the Python requirements for the webui in the future with - -``` -pip install -r requirements-minimal.txt --upgrade -``` - -and then install the up-to-date backends using the commands below. The file `requirements-minimal.txt` contains the all requirements except for the pre-compiled wheels for GPTQ and llama-cpp-python. - -## AutoGPTQ - -``` -conda activate textgen -pip uninstall auto-gptq -i -git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ -pip install . -``` - -## GPTQ-for-LLaMa - -``` -conda activate textgen -pip uninstall quant-cuda -y -cd text-generation-webui/repositories -rm -r GPTQ-for-LLaMa -git clone https://github.com/oobabooga/GPTQ-for-LLaMa -cd GPTQ-for-LLaMa -python setup_cuda.py install -``` - -## llama-cpp-python - -If you do not have a GPU: - -``` -conda activate textgen -pip uninstall -y llama-cpp-python -pip install llama-cpp-python -``` - -If you have a GPU, use the commands here instead: [llama.cpp-models.md#gpu-acceleration](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration) diff --git a/docs/README.md b/docs/README.md index 72e816de..37c4fe37 100644 --- a/docs/README.md +++ b/docs/README.md @@ -19,4 +19,3 @@ * [WSL installation guide](WSL-installation-guide.md) * [Docker Compose](Docker.md) * [Audio notification](Audio-Notification.md) -* [Performance optimizations](Performance-optimizations.md) diff --git a/requirements-minimal.txt b/requirements-minimal.txt deleted file mode 100644 index 07b4d838..00000000 --- a/requirements-minimal.txt +++ /dev/null @@ -1,21 +0,0 @@ -accelerate==0.20.3 -colorama -datasets -einops -flexgen==0.1.7 -gradio_client==0.2.5 -gradio==3.33.1 -markdown -numpy -pandas -Pillow>=9.5.0 -pyyaml -requests -safetensors==0.3.1 -sentencepiece -tqdm -scipy -transformers==4.30.0 -git+https://github.com/huggingface/peft@e45529b149c7f91ec1d4d82a5a152ef56c56cb94 -bitsandbytes==0.39.0; platform_system != "Windows" -https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.39.0-py3-none-any.whl; platform_system == "Windows"