diff --git a/README.md b/README.md index 3a7e7eab..f1af6519 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Text generation web UI -A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA. +A Gradio web UI for Large Language Models. Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) of text generation. @@ -10,20 +10,18 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github. ## Features -* 3 interface modes: default, notebook, and chat +* 3 interface modes: default (two columns), notebook, and chat * Multiple model backends: [transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [ExLlama](https://github.com/turboderp/exllama), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [ctransformers](https://github.com/marella/ctransformers) * Dropdown menu for quickly switching between different models -* LoRA: load and unload LoRAs on the fly, train a new LoRA -* Precise instruction templates for chat mode, including Llama 2, Alpaca, Vicuna, WizardLM, StableLM, and many others +* LoRA: load and unload LoRAs on the fly, train a new LoRA using QLoRA +* Precise instruction templates for chat mode, including Llama-2-chat, Alpaca, Vicuna, WizardLM, StableLM, and many others +* 4-bit, 8-bit, and CPU inference through the transformers library +* Use llama.cpp models with transformers samplers (`llamacpp_HF` loader) * [Multimodal pipelines, including LLaVA and MiniGPT-4](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/multimodal) -* 8-bit and 4-bit inference through bitsandbytes -* CPU mode for transformers models -* [DeepSpeed ZeRO-3 inference](docs/DeepSpeed.md) -* [Extensions](docs/Extensions.md) +* [Extensions framework](docs/Extensions.md) * [Custom chat characters](docs/Chat-mode.md) * Very efficient text streaming * Markdown output with LaTeX rendering, to use for instance with [GALACTICA](https://github.com/paperswithcode/galai) -* Nice HTML output for GPT-4chan * API, including endpoints for websocket streaming ([see the examples](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples)) To learn how to use the various features, check out the Documentation: https://github.com/oobabooga/text-generation-webui/tree/main/docs @@ -38,26 +36,24 @@ To learn how to use the various features, check out the Documentation: https://g Just download the zip above, extract it, and double-click on "start". The web UI and all its dependencies will be installed in the same folder. -* The source codes are here: https://github.com/oobabooga/one-click-installers +* The source codes and more information can be found here: https://github.com/oobabooga/one-click-installers * There is no need to run the installers as admin. -* AMD doesn't work on Windows. * Huge thanks to [@jllllll](https://github.com/jllllll), [@ClayShoaf](https://github.com/ClayShoaf), and [@xNul](https://github.com/xNul) for their contributions to these installers. ### Manual installation using Conda -Recommended if you have some experience with the command line. +Recommended if you have some experience with the command-line. #### 0. Install Conda https://docs.conda.io/en/latest/miniconda.html -On Linux or WSL, it can be automatically installed with these two commands: +On Linux or WSL, it can be automatically installed with these two commands ([source](https://educe-ubc.github.io/conda.html)): ``` curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh" bash Miniconda3.sh ``` -Source: https://educe-ubc.github.io/conda.html #### 1. Create a new conda environment @@ -92,9 +88,9 @@ cd text-generation-webui pip install -r requirements.txt ``` -#### bitsandbytes +#### Note about older NVIDIA GPUs -bitsandbytes >= 0.39 may not work on older NVIDIA GPUs. In that case, to use `--load-in-8bit`, you may have to downgrade like this: +bitsandbytes >= 0.39 may not work. In that case, to use `--load-in-8bit`, you may have to downgrade like this: * Linux: `pip install bitsandbytes==0.38.1` * Windows: `pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl` @@ -113,37 +109,52 @@ docker compose up --build ### Updating the requirements -From time to time, the `requirements.txt` changes. To update, use this command: +From time to time, the `requirements.txt` changes. To update, use these commands: ``` conda activate textgen cd text-generation-webui pip install -r requirements.txt --upgrade ``` + ## Downloading models -Models should be placed inside the `models/` folder. +Models should be placed in the `text-generation-webui/models` folder. They are usually downloaded from [Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads). -[Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads) is the main place to download models. These are some examples: +* Transformers or GPTQ models are made of several files and must be placed in a subfolder. Example: -* [Pythia](https://huggingface.co/models?sort=downloads&search=eleutherai%2Fpythia+deduped) -* [OPT](https://huggingface.co/models?search=facebook/opt) -* [GALACTICA](https://huggingface.co/models?search=facebook/galactica) -* [GPT-J 6B](https://huggingface.co/EleutherAI/gpt-j-6B/tree/main) +``` +text-generation-webui/ +├── models +│   ├── lmsys_vicuna-33b-v1.3 +│   │   ├── config.json +│   │   ├── generation_config.json +│   │   ├── huggingface-metadata.txt +│   │   ├── pytorch_model-00001-of-00007.bin +│   │   ├── pytorch_model-00002-of-00007.bin +│   │   ├── pytorch_model-00003-of-00007.bin +│   │   ├── pytorch_model-00004-of-00007.bin +│   │   ├── pytorch_model-00005-of-00007.bin +│   │   ├── pytorch_model-00006-of-00007.bin +│   │   ├── pytorch_model-00007-of-00007.bin +│   │   ├── pytorch_model.bin.index.json +│   │   ├── README.md +│   │   ├── special_tokens_map.json +│   │   ├── tokenizer_config.json +│   │   └── tokenizer.model +``` -You can automatically download a model from HF using the script `download-model.py`: +In the "Model" tab of the UI, those models can be downloaded from Hugging Face. You can also download them from the command-line with `python download-model.py organization/model`. - python download-model.py organization/model +* GGML models are a single file and can be placed directly into `models`. Example: -For example: +``` +text-generation-webui/ +├── models +│   ├── llama-13b.ggmlv3.q4_K_M.bin +``` - python download-model.py facebook/opt-1.3b - -To download a protected model, set env vars `HF_USER` and `HF_PASS` to your Hugging Face username and password (or [User Access Token](https://huggingface.co/settings/tokens)). The model's terms must first be accepted on the HF website. - -#### GGML models - -You can drop these directly into the `models/` folder, making sure that the file name contains `ggml` somewhere and ends in `.bin`. +Those models have to be downloaded manually and placed into that folder. #### GPT-4chan @@ -354,5 +365,5 @@ If you would like to contribute to the project, check out the [Contributing guid ## Community -* Subreddit: https://www.reddit.com/r/oobaboogazz/ +* Subreddit: https://www.reddit.com/r/oobabooga/ * Discord: https://discord.gg/jwZCF2dPQN