Merge branch 'oobabooga:dev' into dev

This commit is contained in:
Artificiangel 2024-10-12 17:59:33 -04:00 committed by GitHub
commit ac2dc37c76
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
29 changed files with 435 additions and 306 deletions

View File

@ -10,33 +10,29 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
## Features ## Features
* Multiple backends for text generation in a single UI and API, including [Transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp) (through [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)), [ExLlamaV2](https://github.com/turboderp/exllamav2), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [HQQ](https://github.com/mobiusml/hqq), and [AQLM](https://github.com/Vahe1994/AQLM) are also supported through the Transformers loader. - Supports multiple text generation backends in one UI/API, including [Transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), and [ExLlamaV2](https://github.com/turboderp/exllamav2). [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [HQQ](https://github.com/mobiusml/hqq), and [AQLM](https://github.com/Vahe1994/AQLM) are also supported but you need to install them manually.
* OpenAI-compatible API server with Chat and Completions endpoints see the [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples). - OpenAI-compatible API with Chat and Completions endpoints see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples).
* Automatic prompt formatting for each model using the Jinja2 template in its metadata. - Automatic prompt formatting using Jinja2 templates.
* Three chat modes: `instruct`, `chat-instruct`, and `chat`, allowing for both instruction-following and casual conversations with characters. `chat-instruct` mode automatically applies the model's template to the chat prompt, ensuring high-quality outputs without manual setup. - Three chat modes: `instruct`, `chat-instruct`, and `chat`, with automatic prompt templates in `chat-instruct`.
* "Past chats" menu to quickly switch between conversations and start new ones. - "Past chats" menu to quickly switch between conversations.
* Free-form generation in the Default/Notebook tabs without being limited to chat turns. Send formatted chat conversations from the Chat tab to these tabs. - Free-form text generation in the Default/Notebook tabs without being limited to chat turns. You can send formatted conversations from the Chat tab to these.
* Multiple sampling parameters and generation options for sophisticated text generation control. - Multiple sampling parameters and generation options for sophisticated text generation control.
* Easy switching between different models through the UI without restarting, using the "Model" tab. - Switch between different models easily in the UI without restarting.
* Simple LoRA fine-tuning tool to customize models with your data. - Simple LoRA fine-tuning tool.
* All in one folder. The requirements are installed in a self-contained `installer_files` folder that doesn't interfere with the system's environment. - Requirements installed in a self-contained `installer_files` directory that doesn't interfere with the system environment.
* Extensions support, including numerous built-in and user-contributed extensions. See [the wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [the extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details. - Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
## How to install ## How to install
1) Clone or [download](https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip) the repository. 1) Clone or [download the repository](https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip).
2) Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS. 2) Run the script that matches your OS: `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat`.
3) Select your GPU vendor when asked. 3) Select your GPU vendor when asked.
4) Once the installation ends, browse to `http://localhost:7860`. 4) Once the installation ends, browse to `http://localhost:7860`.
5) Have fun! 5) Have fun!
To restart the web UI in the future, run the `start_` script again. To restart the web UI later, just run the same `start_` script. If you need to reinstall, delete the `installer_files` folder created during setup and run the script again.
This script creates an `installer_files` folder where it sets up the project's requirements. If you need to reinstall the requirements, just delete that folder and start the web UI again. You can use command-line flags, like `./start_linux.sh --help`, or add them to `CMD_FLAGS.txt` (such as `--api` to enable API use). To update the project, run `update_wizard_linux.sh`, `update_wizard_windows.bat`, `update_wizard_macos.sh`, or `update_wizard_wsl.bat`.
The script accepts command-line flags, such as `./start_linux.sh --help`. Alternatively, you can edit the `CMD_FLAGS.txt` file with a text editor and add your flags there, such as `--api` in case you need to use the API.
To get updates in the future, run `update_wizard_linux.sh`, `update_wizard_windows.bat`, `update_wizard_macos.sh`, or `update_wizard_wsl.bat`.
<details> <details>
<summary> <summary>
@ -80,12 +76,12 @@ conda activate textgen
| System | GPU | Command | | System | GPU | Command |
|--------|---------|---------| |--------|---------|---------|
| Linux/WSL | NVIDIA | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121` | | Linux/WSL | NVIDIA | `pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121` |
| Linux/WSL | CPU only | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu` | | Linux/WSL | CPU only | `pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cpu` |
| Linux | AMD | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/rocm5.6` | | Linux | AMD | `pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/rocm6.1` |
| MacOS + MPS | Any | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2` | | MacOS + MPS | Any | `pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1` |
| Windows | NVIDIA | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121` | | Windows | NVIDIA | `pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121` |
| Windows | CPU only | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2` | | Windows | CPU only | `pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1` |
The up-to-date commands can be found here: https://pytorch.org/get-started/locally/. The up-to-date commands can be found here: https://pytorch.org/get-started/locally/.
@ -150,7 +146,7 @@ Then browse to
1) For Kepler GPUs and older, you will need to install CUDA 11.8 instead of 12: 1) For Kepler GPUs and older, you will need to install CUDA 11.8 instead of 12:
``` ```
pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118 pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118
conda install -y -c "nvidia/label/cuda-11.8.0" cuda-runtime conda install -y -c "nvidia/label/cuda-11.8.0" cuda-runtime
``` ```

View File

@ -38,6 +38,8 @@ class GenerationOptions(BaseModel):
dry_base: float = 1.75 dry_base: float = 1.75
dry_allowed_length: int = 2 dry_allowed_length: int = 2
dry_sequence_breakers: str = '"\\n", ":", "\\"", "*"' dry_sequence_breakers: str = '"\\n", ":", "\\"", "*"'
xtc_threshold: float = 0.1
xtc_probability: float = 0
truncation_length: int = 0 truncation_length: int = 0
max_tokens_second: int = 0 max_tokens_second: int = 0
prompt_lookup_num_tokens: int = 0 prompt_lookup_num_tokens: int = 0

View File

@ -96,7 +96,7 @@ def ui():
with gr.Accordion("Settings", open=False): with gr.Accordion("Settings", open=False):
auto_submit = gr.Checkbox(label='Submit the transcribed audio automatically', value=params['auto_submit']) auto_submit = gr.Checkbox(label='Submit the transcribed audio automatically', value=params['auto_submit'])
device_dropd = gr.Dropdown(label='Device', value=str(startup_device), choices=["cuda", "cpu", "none"]) device_dropd = gr.Dropdown(label='Device', value=str(startup_device), choices=["cuda", "cpu", "none"])
whisper_model_dropd = gr.Dropdown(label='Whisper Model', value=params['whipser_model'], choices=["tiny.en", "base.en", "small.en", "medium.en", "tiny", "base", "small", "medium", "large"]) whisper_model_dropd = gr.Dropdown(label='Whisper Model', value=params['whipser_model'], choices=["tiny.en", "base.en", "small.en", "medium.en", "tiny", "base", "small", "medium", "large", "turbo"])
whisper_language = gr.Dropdown(label='Whisper Language', value=params['whipser_language'], choices=["english", "chinese", "german", "spanish", "russian", "korean", "french", "japanese", "portuguese", "turkish", "polish", "catalan", "dutch", "arabic", "swedish", "italian", "indonesian", "hindi", "finnish", "vietnamese", "hebrew", "ukrainian", "greek", "malay", "czech", "romanian", "danish", "hungarian", "tamil", "norwegian", "thai", "urdu", "croatian", "bulgarian", "lithuanian", "latin", "maori", "malayalam", "welsh", "slovak", "telugu", "persian", "latvian", "bengali", "serbian", "azerbaijani", "slovenian", "kannada", "estonian", "macedonian", "breton", "basque", "icelandic", "armenian", "nepali", "mongolian", "bosnian", "kazakh", "albanian", "swahili", "galician", "marathi", "punjabi", "sinhala", "khmer", "shona", "yoruba", "somali", "afrikaans", "occitan", "georgian", "belarusian", "tajik", "sindhi", "gujarati", "amharic", "yiddish", "lao", "uzbek", "faroese", "haitian creole", "pashto", "turkmen", "nynorsk", "maltese", "sanskrit", "luxembourgish", "myanmar", "tibetan", "tagalog", "malagasy", "assamese", "tatar", "hawaiian", "lingala", "hausa", "bashkir", "javanese", "sundanese"]) whisper_language = gr.Dropdown(label='Whisper Language', value=params['whipser_language'], choices=["english", "chinese", "german", "spanish", "russian", "korean", "french", "japanese", "portuguese", "turkish", "polish", "catalan", "dutch", "arabic", "swedish", "italian", "indonesian", "hindi", "finnish", "vietnamese", "hebrew", "ukrainian", "greek", "malay", "czech", "romanian", "danish", "hungarian", "tamil", "norwegian", "thai", "urdu", "croatian", "bulgarian", "lithuanian", "latin", "maori", "malayalam", "welsh", "slovak", "telugu", "persian", "latvian", "bengali", "serbian", "azerbaijani", "slovenian", "kannada", "estonian", "macedonian", "breton", "basque", "icelandic", "armenian", "nepali", "mongolian", "bosnian", "kazakh", "albanian", "swahili", "galician", "marathi", "punjabi", "sinhala", "khmer", "shona", "yoruba", "somali", "afrikaans", "occitan", "georgian", "belarusian", "tajik", "sindhi", "gujarati", "amharic", "yiddish", "lao", "uzbek", "faroese", "haitian creole", "pashto", "turkmen", "nynorsk", "maltese", "sanskrit", "luxembourgish", "myanmar", "tibetan", "tagalog", "malagasy", "assamese", "tatar", "hawaiian", "lingala", "hausa", "bashkir", "javanese", "sundanese"])
audio.change( audio.change(

View File

@ -600,4 +600,15 @@ headerBar.addEventListener("click", (e) => {
} }
}); });
//------------------------------------------------
// Add a confirmation dialog when leaving the page
// Useful to avoid data loss
//------------------------------------------------
window.addEventListener("beforeunload", function (event) {
// Cancel the event
event.preventDefault();
// Chrome requires returnValue to be set
event.returnValue = "";
});
moveToChatTab(); moveToChatTab();

View File

@ -1059,7 +1059,12 @@ def handle_start_new_chat_click(state):
convert_to_markdown.cache_clear() convert_to_markdown.cache_clear()
return [history, html, gr.update(choices=histories, value=histories[0][1])] if len(histories) > 0:
past_chats_update = gr.update(choices=histories, value=histories[0][1])
else:
past_chats_update = gr.update(choices=histories)
return [history, html, past_chats_update]
def handle_delete_chat_confirm_click(state): def handle_delete_chat_confirm_click(state):
@ -1110,10 +1115,15 @@ def handle_upload_chat_history(load_chat_history, state):
convert_to_markdown.cache_clear() convert_to_markdown.cache_clear()
if len(histories) > 0:
past_chats_update = gr.update(choices=histories, value=histories[0][1])
else:
past_chats_update = gr.update(choices=histories)
return [ return [
history, history,
html, html,
gr.update(choices=histories, value=histories[0][1]) past_chats_update
] ]
@ -1132,6 +1142,11 @@ def handle_character_menu_change(state):
convert_to_markdown.cache_clear() convert_to_markdown.cache_clear()
if len(histories) > 0:
past_chats_update = gr.update(choices=histories, value=histories[0][1])
else:
past_chats_update = gr.update(choices=histories)
return [ return [
history, history,
html, html,
@ -1140,7 +1155,7 @@ def handle_character_menu_change(state):
picture, picture,
greeting, greeting,
context, context,
gr.update(choices=histories, value=histories[0][1]), past_chats_update,
] ]
@ -1151,12 +1166,17 @@ def handle_mode_change(state):
convert_to_markdown.cache_clear() convert_to_markdown.cache_clear()
if len(histories) > 0:
past_chats_update = gr.update(choices=histories, value=histories[0][1])
else:
past_chats_update = gr.update(choices=histories)
return [ return [
history, history,
html, html,
gr.update(visible=state['mode'] != 'instruct'), gr.update(visible=state['mode'] != 'instruct'),
gr.update(visible=state['mode'] == 'chat-instruct'), gr.update(visible=state['mode'] == 'chat-instruct'),
gr.update(choices=histories, value=histories[0][1]) past_chats_update
] ]

View File

@ -7,6 +7,7 @@ from exllamav2 import (
ExLlamaV2Cache, ExLlamaV2Cache,
ExLlamaV2Cache_8bit, ExLlamaV2Cache_8bit,
ExLlamaV2Cache_Q4, ExLlamaV2Cache_Q4,
ExLlamaV2Cache_TP,
ExLlamaV2Config, ExLlamaV2Config,
ExLlamaV2Tokenizer ExLlamaV2Tokenizer
) )
@ -18,14 +19,6 @@ from modules.text_generation import get_max_prompt_length
try: try:
import flash_attn import flash_attn
except ModuleNotFoundError:
logger.warning(
'You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage '
'to be a lot higher than it could be.\n'
'Try installing flash-attention following the instructions here: '
'https://github.com/Dao-AILab/flash-attention#installation-and-features'
)
pass
except Exception: except Exception:
logger.warning('Failed to load flash-attention due to the following error:\n') logger.warning('Failed to load flash-attention due to the following error:\n')
traceback.print_exc() traceback.print_exc()
@ -54,21 +47,30 @@ class Exllamav2Model:
model = ExLlamaV2(config) model = ExLlamaV2(config)
if not shared.args.autosplit: split = None
split = None if shared.args.gpu_split:
if shared.args.gpu_split: split = [float(alloc) for alloc in shared.args.gpu_split.split(",")]
split = [float(alloc) for alloc in shared.args.gpu_split.split(",")]
if shared.args.enable_tp:
model.load_tp(split)
elif not shared.args.autosplit:
model.load(split) model.load(split)
# Determine the correct cache type
if shared.args.cache_8bit: if shared.args.cache_8bit:
cache = ExLlamaV2Cache_8bit(model, lazy=shared.args.autosplit) cache_type = ExLlamaV2Cache_8bit
elif shared.args.cache_4bit: elif shared.args.cache_4bit:
cache = ExLlamaV2Cache_Q4(model, lazy=shared.args.autosplit) cache_type = ExLlamaV2Cache_Q4
else: else:
cache = ExLlamaV2Cache(model, lazy=shared.args.autosplit) cache_type = ExLlamaV2Cache
if shared.args.autosplit: # Use TP if specified
if shared.args.enable_tp:
cache = ExLlamaV2Cache_TP(model, base=cache_type)
else:
cache = cache_type(model, lazy=shared.args.autosplit)
if shared.args.autosplit and not shared.args.enable_tp:
model.load_autosplit(cache) model.load_autosplit(cache)
tokenizer = ExLlamaV2Tokenizer(config) tokenizer = ExLlamaV2Tokenizer(config)

View File

@ -9,6 +9,7 @@ from exllamav2 import (
ExLlamaV2Cache, ExLlamaV2Cache,
ExLlamaV2Cache_8bit, ExLlamaV2Cache_8bit,
ExLlamaV2Cache_Q4, ExLlamaV2Cache_Q4,
ExLlamaV2Cache_TP,
ExLlamaV2Config ExLlamaV2Config
) )
from torch.nn import CrossEntropyLoss from torch.nn import CrossEntropyLoss
@ -20,14 +21,6 @@ from modules.logging_colors import logger
try: try:
import flash_attn import flash_attn
except ModuleNotFoundError:
logger.warning(
'You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage '
'to be a lot higher than it could be.\n'
'Try installing flash-attention following the instructions here: '
'https://github.com/Dao-AILab/flash-attention#installation-and-features'
)
pass
except Exception: except Exception:
logger.warning('Failed to load flash-attention due to the following error:\n') logger.warning('Failed to load flash-attention due to the following error:\n')
traceback.print_exc() traceback.print_exc()
@ -42,21 +35,30 @@ class Exllamav2HF(PreTrainedModel):
self.ex_model = ExLlamaV2(config) self.ex_model = ExLlamaV2(config)
if not shared.args.autosplit: split = None
split = None if shared.args.gpu_split:
if shared.args.gpu_split: split = [float(alloc) for alloc in shared.args.gpu_split.split(",")]
split = [float(alloc) for alloc in shared.args.gpu_split.split(",")]
if shared.args.enable_tp:
self.ex_model.load_tp(split)
elif not shared.args.autosplit:
self.ex_model.load(split) self.ex_model.load(split)
# Determine the correct cache type
if shared.args.cache_8bit: if shared.args.cache_8bit:
self.ex_cache = ExLlamaV2Cache_8bit(self.ex_model, lazy=shared.args.autosplit) cache_type = ExLlamaV2Cache_8bit
elif shared.args.cache_4bit: elif shared.args.cache_4bit:
self.ex_cache = ExLlamaV2Cache_Q4(self.ex_model, lazy=shared.args.autosplit) cache_type = ExLlamaV2Cache_Q4
else: else:
self.ex_cache = ExLlamaV2Cache(self.ex_model, lazy=shared.args.autosplit) cache_type = ExLlamaV2Cache
if shared.args.autosplit: # Use TP if specified
if shared.args.enable_tp:
self.ex_cache = ExLlamaV2Cache_TP(self.ex_model, base=cache_type)
else:
self.ex_cache = cache_type(self.ex_model, lazy=shared.args.autosplit)
if shared.args.autosplit and not shared.args.enable_tp:
self.ex_model.load_autosplit(self.ex_cache) self.ex_model.load_autosplit(self.ex_cache)
self.past_seq = None self.past_seq = None

View File

@ -2,12 +2,12 @@ import importlib
import platform import platform
from typing import Sequence from typing import Sequence
import numpy as np
from tqdm import tqdm from tqdm import tqdm
from modules import shared from modules import shared
from modules.cache_utils import process_llamacpp_cache from modules.cache_utils import process_llamacpp_cache
imported_module = None imported_module = None
@ -57,8 +57,6 @@ def eval_with_progress(self, tokens: Sequence[int]):
with tqdm to show prompt processing progress. with tqdm to show prompt processing progress.
""" """
assert self._ctx.ctx is not None
assert self._batch.batch is not None
self._ctx.kv_cache_seq_rm(-1, self.n_tokens, -1) self._ctx.kv_cache_seq_rm(-1, self.n_tokens, -1)
if len(tokens) > self.n_batch: if len(tokens) > self.n_batch:
@ -80,13 +78,20 @@ def eval_with_progress(self, tokens: Sequence[int]):
if self.context_params.logits_all: if self.context_params.logits_all:
rows = n_tokens rows = n_tokens
cols = self._n_vocab cols = self._n_vocab
logits = self._ctx.get_logits()[: rows * cols] logits = np.ctypeslib.as_array(
self.scores[n_past : n_past + n_tokens, :].reshape(-1)[: :] = logits self._ctx.get_logits(), shape=(rows * cols,)
)
self.scores[n_past : n_past + n_tokens, :].reshape(-1)[::] = logits
self.last_updated_index = n_past + n_tokens - 1
else: else:
rows = 1 rows = 1
cols = self._n_vocab cols = self._n_vocab
logits = self._ctx.get_logits()[: rows * cols] logits = np.ctypeslib.as_array(
self.scores[n_past + n_tokens - 1, :].reshape(-1)[: :] = logits self._ctx.get_logits(), shape=(rows * cols,)
)
last_token_index = min(n_past + n_tokens - 1, self.scores.shape[0] - 1)
self.scores[last_token_index, :] = logits.reshape(-1)
self.last_updated_index = last_token_index
# Update n_tokens # Update n_tokens
self.n_tokens += n_tokens self.n_tokens += n_tokens

View File

@ -127,7 +127,7 @@ class LlamacppHF(PreTrainedModel):
self.model.reset() self.model.reset()
self.model.eval(seq) self.model.eval(seq)
logits = torch.tensor(self.model.scores[self.model.n_tokens - 1, :]).view(1, 1, -1).to(input_ids.device) logits = torch.tensor(self.model.scores[self.model.last_updated_index, :]).view(1, 1, -1).to(input_ids.device)
else: else:
self.model.reset() self.model.reset()
self.model.eval(seq) self.model.eval(seq)
@ -205,5 +205,6 @@ class LlamacppHF(PreTrainedModel):
Llama = llama_cpp_lib().Llama Llama = llama_cpp_lib().Llama
model = Llama(**params) model = Llama(**params)
model.last_updated_index = -1
return LlamacppHF(model, model_file) return LlamacppHF(model, model_file)

View File

@ -90,6 +90,7 @@ loaders_and_params = OrderedDict({
'cache_8bit', 'cache_8bit',
'cache_4bit', 'cache_4bit',
'autosplit', 'autosplit',
'enable_tp',
'alpha_value', 'alpha_value',
'compress_pos_emb', 'compress_pos_emb',
'trust_remote_code', 'trust_remote_code',
@ -105,6 +106,7 @@ loaders_and_params = OrderedDict({
'cache_8bit', 'cache_8bit',
'cache_4bit', 'cache_4bit',
'autosplit', 'autosplit',
'enable_tp',
'alpha_value', 'alpha_value',
'compress_pos_emb', 'compress_pos_emb',
'exllamav2_info', 'exllamav2_info',
@ -168,6 +170,8 @@ def transformers_samplers():
'dry_base', 'dry_base',
'dry_allowed_length', 'dry_allowed_length',
'dry_sequence_breakers', 'dry_sequence_breakers',
'xtc_threshold',
'xtc_probability',
'seed', 'seed',
'do_sample', 'do_sample',
'penalty_alpha', 'penalty_alpha',
@ -242,6 +246,8 @@ loaders_samplers = {
'dry_base', 'dry_base',
'dry_allowed_length', 'dry_allowed_length',
'dry_sequence_breakers', 'dry_sequence_breakers',
'xtc_threshold',
'xtc_probability',
'seed', 'seed',
'do_sample', 'do_sample',
'mirostat_mode', 'mirostat_mode',
@ -304,6 +310,8 @@ loaders_samplers = {
'dry_base', 'dry_base',
'dry_allowed_length', 'dry_allowed_length',
'dry_sequence_breakers', 'dry_sequence_breakers',
'xtc_threshold',
'xtc_probability',
'seed', 'seed',
'do_sample', 'do_sample',
'mirostat_mode', 'mirostat_mode',

View File

@ -70,11 +70,11 @@ def load_model(model_name, loader=None):
shared.model_name = model_name shared.model_name = model_name
load_func_map = { load_func_map = {
'Transformers': huggingface_loader, 'Transformers': huggingface_loader,
'AutoGPTQ': AutoGPTQ_loader,
'llama.cpp': llamacpp_loader, 'llama.cpp': llamacpp_loader,
'llamacpp_HF': llamacpp_HF_loader, 'llamacpp_HF': llamacpp_HF_loader,
'ExLlamav2': ExLlamav2_loader, 'ExLlamav2': ExLlamav2_loader,
'ExLlamav2_HF': ExLlamav2_HF_loader, 'ExLlamav2_HF': ExLlamav2_HF_loader,
'AutoGPTQ': AutoGPTQ_loader,
'HQQ': HQQ_loader, 'HQQ': HQQ_loader,
'TensorRT-LLM': TensorRT_LLM_loader, 'TensorRT-LLM': TensorRT_LLM_loader,
} }
@ -302,12 +302,6 @@ def llamacpp_HF_loader(model_name):
return model return model
def AutoGPTQ_loader(model_name):
import modules.AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
def ExLlamav2_loader(model_name): def ExLlamav2_loader(model_name):
from modules.exllamav2 import Exllamav2Model from modules.exllamav2 import Exllamav2Model
@ -321,9 +315,21 @@ def ExLlamav2_HF_loader(model_name):
return Exllamav2HF.from_pretrained(model_name) return Exllamav2HF.from_pretrained(model_name)
def AutoGPTQ_loader(model_name):
try:
import modules.AutoGPTQ_loader
except ModuleNotFoundError:
raise ModuleNotFoundError("Failed to import 'autogptq'. Please install it manually following the instructions in the AutoGPTQ GitHub repository.")
return modules.AutoGPTQ_loader.load_quantized(model_name)
def HQQ_loader(model_name): def HQQ_loader(model_name):
from hqq.core.quantize import HQQBackend, HQQLinear try:
from hqq.models.hf.base import AutoHQQHFModel from hqq.core.quantize import HQQBackend, HQQLinear
from hqq.models.hf.base import AutoHQQHFModel
except ModuleNotFoundError:
raise ModuleNotFoundError("Failed to import 'hqq'. Please install it manually following the instructions in the HQQ GitHub repository.")
logger.info(f"Loading HQQ model with backend: \"{shared.args.hqq_backend}\"") logger.info(f"Loading HQQ model with backend: \"{shared.args.hqq_backend}\"")
@ -334,7 +340,10 @@ def HQQ_loader(model_name):
def TensorRT_LLM_loader(model_name): def TensorRT_LLM_loader(model_name):
from modules.tensorrt_llm import TensorRTLLMModel try:
from modules.tensorrt_llm import TensorRTLLMModel
except ModuleNotFoundError:
raise ModuleNotFoundError("Failed to import 'tensorrt_llm'. Please install it manually following the instructions in the TensorRT-LLM GitHub repository.")
model = TensorRTLLMModel.from_pretrained(model_name) model = TensorRTLLMModel.from_pretrained(model_name)
return model return model

View File

@ -44,7 +44,9 @@ def default_preset():
'dry_base': 1.75, 'dry_base': 1.75,
'dry_allowed_length': 2, 'dry_allowed_length': 2,
'dry_sequence_breakers': '"\\n", ":", "\\"", "*"', 'dry_sequence_breakers': '"\\n", ":", "\\"", "*"',
'sampler_priority': 'temperature\ndynamic_temperature\nquadratic_sampling\ntop_k\ntop_p\ntypical_p\nepsilon_cutoff\neta_cutoff\ntfs\ntop_a\nmin_p\nmirostat' 'xtc_threshold': 0.1,
'xtc_probability': 0,
'sampler_priority': 'repetition_penalty\npresence_penalty\nfrequency_penalty\ndry\ntemperature\ndynamic_temperature\nquadratic_sampling\ntop_k\ntop_p\ntypical_p\nepsilon_cutoff\neta_cutoff\ntfs\ntop_a\nmin_p\nmirostat\nxtc\nencoder_repetition_penalty\nno_repeat_ngram'
} }

View File

@ -1,6 +1,7 @@
import json import json
import math import math
import pprint import pprint
import random
import torch import torch
import transformers import transformers
@ -191,6 +192,53 @@ class TopALogitsWarper(LogitsWarper):
return scores return scores
# Exclude Top Choices (XTC)
class XTCLogitsWarper(LogitsWarper):
def __init__(self, threshold: float, probability: float, filter_value: float = -float("Inf")):
self.threshold = threshold
self.probability = probability
self.filter_value = filter_value
self.special_token_ids = [
shared.tokenizer.encode("\n")[-1],
]
if shared.tokenizer.eos_token_id is not None:
self.special_token_ids.append(shared.tokenizer.eos_token_id)
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
# `random` returns values in the half-open range [0, 1), so setting `probability`
# to 0 means the sampler never takes action, while setting it to 1 means the sampler
# always takes action.
#
# Note that while XTC is most intuitively described as "if multiple tokens meet
# the threshold, then with probability...", reversing the two conditions is logically
# equivalent, and improves performance because processing can immediately be stopped
# if the random check fails.
if random.random() >= self.probability:
return scores
sorted_logits, sorted_indices = torch.sort(scores, descending=True)
probs = sorted_logits.softmax(dim=-1)
sorted_indices_to_remove = torch.full_like(probs, False, dtype=torch.bool)
# This operation sets exactly those indices to `True` for which the next index has
# probability above the threshold. Since `probs` is sorted, those are the indices
# of all tokens that meet the threshold, *except* the least probable one.
sorted_indices_to_remove[..., :-1] = probs[..., 1:] >= self.threshold
# Convert sorted_indices_to_remove to the original indices
indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
# If newline or EOS tokens would be removed, return the original scores
if indices_to_remove[:, self.special_token_ids].any():
return scores
# Otherwise, remove tokens with the mask
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
class DRYLogitsProcessor(LogitsProcessor): class DRYLogitsProcessor(LogitsProcessor):
def __init__(self, multiplier: float, base: float, allowed_length: int, sequence_breakers: set[int], _range: int): def __init__(self, multiplier: float, base: float, allowed_length: int, sequence_breakers: set[int], _range: int):
self.multiplier = multiplier self.multiplier = multiplier
@ -323,62 +371,141 @@ class SpyLogitsWarper(LogitsWarper):
class RepetitionPenaltyLogitsProcessorWithRange(LogitsProcessor): class RepetitionPenaltyLogitsProcessorWithRange(LogitsProcessor):
''' def __init__(self, penalty: float, _range: int):
Copied from the transformers library
'''
def __init__(self, penalty: float, presence_penalty: float, frequency_penalty: float, _range: int):
if not (penalty > 0): if not (penalty > 0):
raise ValueError(f"`penalty` has to be strictly positive, but is {penalty}") raise ValueError(f"`penalty` has to be strictly positive, but is {penalty}")
self.penalty = penalty self.penalty = penalty
self.presence_penalty = presence_penalty
self.frequency_penalty = frequency_penalty
self._range = _range self._range = _range
def apply_repetition_penalty(self, input_ids_row, scores_row):
unique_ids = torch.unique(input_ids_row)
score = torch.gather(scores_row, 0, unique_ids)
# Apply multiplicative repetition penalty
score = torch.where(score < 0, score * self.penalty, score / self.penalty)
scores_row.scatter_(0, unique_ids, score)
return scores_row
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor: def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
input_ids = input_ids[:, -self._range:] input_ids = input_ids[:, -self._range:]
# We loop here because torch.unique() needs to process each row separately in the
# case that batch_size > 1.
for input_ids_row, scores_row in zip(input_ids, scores): for input_ids_row, scores_row in zip(input_ids, scores):
unique_ids, counts = torch.unique(input_ids_row, return_counts=True) scores_row = self.apply_repetition_penalty(input_ids_row, scores_row)
score = torch.gather(scores_row, 0, unique_ids)
# multiplicative repetition penalty
# if score < 0 then repetition penalty has to be multiplied to reduce the previous token probability
score = torch.where(score < 0, score * self.penalty, score / self.penalty)
scores_row.scatter_(0, unique_ids, score)
# presence_penalty and frequency_penalty
raw_presence_penalty = (counts > 0).to(scores.dtype)
raw_frequency_penalty = counts.to(scores.dtype)
additive_penalty = raw_presence_penalty * self.presence_penalty + raw_frequency_penalty * self.frequency_penalty
scores_row.scatter_add_(0, unique_ids, -additive_penalty)
return scores return scores
def get_logits_warper_patch(self, generation_config, **kwargs): class PresencePenaltyLogitsProcessor(LogitsProcessor):
def __init__(self, presence_penalty: float, _range: int):
self.presence_penalty = presence_penalty
self._range = _range
def apply_presence_penalty(self, input_ids_row, scores_row):
unique_ids, counts = torch.unique(input_ids_row, return_counts=True)
# Apply presence penalty
raw_presence_penalty = (counts > 0).to(scores_row.dtype)
presence_penalty = raw_presence_penalty * self.presence_penalty
scores_row.scatter_add_(0, unique_ids, -presence_penalty)
return scores_row
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
input_ids = input_ids[:, -self._range:]
for input_ids_row, scores_row in zip(input_ids, scores):
scores_row = self.apply_presence_penalty(input_ids_row, scores_row)
return scores
class FrequencyPenaltyLogitsProcessor(LogitsProcessor):
def __init__(self, frequency_penalty: float, _range: int):
self.frequency_penalty = frequency_penalty
self._range = _range
def apply_frequency_penalty(self, input_ids_row, scores_row):
unique_ids, counts = torch.unique(input_ids_row, return_counts=True)
# Apply frequency penalty
raw_frequency_penalty = counts.to(scores_row.dtype)
frequency_penalty = raw_frequency_penalty * self.frequency_penalty
scores_row.scatter_add_(0, unique_ids, -frequency_penalty)
return scores_row
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
input_ids = input_ids[:, -self._range:]
for input_ids_row, scores_row in zip(input_ids, scores):
scores_row = self.apply_frequency_penalty(input_ids_row, scores_row)
return scores
def get_logits_processor_patch(self, **kwargs):
generation_config = kwargs['generation_config']
# Parameter sanitization # Parameter sanitization
if isinstance(generation_config.temperature, int): if isinstance(generation_config.temperature, int):
generation_config.temperature = float(generation_config.temperature) # Must be float generation_config.temperature = float(generation_config.temperature) # Must be float
# Get the original warpers # Get the original warpers
warpers = self._get_logits_warper_old(generation_config, **kwargs) warpers = self._get_logits_processor_old(**kwargs)
# Replace temperature with our modified class. for i in range(len(warpers) - 1, -1, -1):
# Currently, it behaves identically to the original. # Replace temperature with our modified class.
for i in range(len(warpers)):
if warpers[i].__class__.__name__ == 'TemperatureLogitsWarper': if warpers[i].__class__.__name__ == 'TemperatureLogitsWarper':
warpers[i] = TemperatureLogitsWarperCustom( warpers[i] = TemperatureLogitsWarperCustom(
generation_config.temperature, generation_config.temperature,
) )
# Stuff we don't need
elif warpers[i].__class__.__name__ in ['SuppressTokensLogitsProcessor', 'RepetitionPenaltyLogitsProcessor']:
del warpers[i]
# Add custom warpers # Add custom warpers
warpers_to_add = LogitsProcessorList() warpers_to_add = LogitsProcessorList()
min_tokens_to_keep = 2 if generation_config.num_beams > 1 else 1 min_tokens_to_keep = 2 if generation_config.num_beams > 1 else 1
if generation_config.repetition_penalty is not None and generation_config.repetition_penalty != 1.0:
warpers_to_add.append(
RepetitionPenaltyLogitsProcessorWithRange(
penalty=generation_config.repetition_penalty,
_range=generation_config.repetition_penalty_range
)
)
if generation_config.presence_penalty is not None and generation_config.presence_penalty != 0.0:
warpers_to_add.append(
PresencePenaltyLogitsProcessor(
presence_penalty=generation_config.presence_penalty,
_range=generation_config.repetition_penalty_range
)
)
if generation_config.frequency_penalty is not None and generation_config.frequency_penalty != 0.0:
warpers_to_add.append(
FrequencyPenaltyLogitsProcessor(
frequency_penalty=generation_config.frequency_penalty,
_range=generation_config.repetition_penalty_range
)
)
if generation_config.dry_multiplier is not None and generation_config.dry_multiplier > 0.0:
dry_sequence_breakers = generation_config.dry_sequence_breakers
# Support both JSON array notation and comma-separated strings.
if not dry_sequence_breakers.startswith("["):
dry_sequence_breakers = "[" + dry_sequence_breakers + "]"
sequence_breaker_strings = json.loads(dry_sequence_breakers)
# Prefix with 'a' to get the correct encoding of the token at the end of a text.
sequence_breakers = {shared.tokenizer.encode(f'a{s}')[-1] for s in sequence_breaker_strings}
warpers.append(
DRYLogitsProcessor(
multiplier=generation_config.dry_multiplier,
base=generation_config.dry_base,
allowed_length=generation_config.dry_allowed_length,
sequence_breakers=sequence_breakers,
_range=generation_config.repetition_penalty_range,
)
)
if generation_config.tfs is not None and 0.0 <= generation_config.tfs < 1.0: if generation_config.tfs is not None and 0.0 <= generation_config.tfs < 1.0:
warpers_to_add.append( warpers_to_add.append(
TailFreeLogitsWarper( TailFreeLogitsWarper(
@ -395,6 +522,14 @@ def get_logits_warper_patch(self, generation_config, **kwargs):
) )
) )
if generation_config.xtc_probability is not None and generation_config.xtc_probability > 0:
warpers_to_add.append(
XTCLogitsWarper(
threshold=generation_config.xtc_threshold,
probability=generation_config.xtc_probability,
)
)
if generation_config.dynamic_temperature: if generation_config.dynamic_temperature:
warpers_to_add.append( warpers_to_add.append(
DynamicTemperatureLogitsWarper( DynamicTemperatureLogitsWarper(
@ -436,11 +571,10 @@ def get_logits_warper_patch(self, generation_config, **kwargs):
if generation_config.temperature_last: if generation_config.temperature_last:
for param_name in ['temperature', 'dynamic_temperature', 'quadratic_sampling']: for param_name in ['temperature', 'dynamic_temperature', 'quadratic_sampling']:
if param_name in sampler_priority: if param_name in sampler_priority:
if param_name in sampler_priority: index = sampler_priority.index(param_name)
index = sampler_priority.index(param_name) sampler_priority.append(sampler_priority.pop(index))
sampler_priority.append(sampler_priority.pop(index)) else:
else: sampler_priority.append(param_name)
sampler_priority.append(param_name)
class_name_to_nickname = { class_name_to_nickname = {
'DynamicTemperatureLogitsWarper': 'dynamic_temperature', 'DynamicTemperatureLogitsWarper': 'dynamic_temperature',
@ -454,17 +588,23 @@ def get_logits_warper_patch(self, generation_config, **kwargs):
'TopALogitsWarper': 'top_a', 'TopALogitsWarper': 'top_a',
'TopKLogitsWarper': 'top_k', 'TopKLogitsWarper': 'top_k',
'TopPLogitsWarper': 'top_p', 'TopPLogitsWarper': 'top_p',
'TypicalLogitsWarper': 'typical_p' 'TypicalLogitsWarper': 'typical_p',
'XTCLogitsWarper': 'xtc',
'RepetitionPenaltyLogitsProcessorWithRange': 'repetition_penalty',
'PresencePenaltyLogitsProcessor': 'presence_penalty',
'FrequencyPenaltyLogitsProcessor': 'frequency_penalty',
'DRYLogitsProcessor': 'dry',
'EncoderRepetitionPenaltyLogitsProcessor': 'encoder_repetition_penalty',
'NoRepeatNGramLogitsProcessor': 'no_repeat_ngram',
} }
def custom_sort_key(obj): def custom_sort_key(obj):
class_name = obj.__class__.__name__ class_name = obj.__class__.__name__
# Return a large value if class name is not mapped or if the mapped nickname is not in priority # Return -1 if class_name is not mapped
if class_name not in class_name_to_nickname or class_name_to_nickname[class_name] not in sampler_priority: if class_name not in class_name_to_nickname or class_name_to_nickname[class_name] not in sampler_priority:
return float('inf') return -1
# Return the index of the nickname in the priority list for sorting
return sampler_priority.index(class_name_to_nickname[class_name]) return sampler_priority.index(class_name_to_nickname[class_name])
# Sort the list using the custom key function # Sort the list using the custom key function
@ -482,49 +622,6 @@ def get_logits_warper_patch(self, generation_config, **kwargs):
return warpers return warpers
def get_logits_processor_patch(self, **kwargs):
generation_config = kwargs['generation_config']
do_rep_pen_hijack = (generation_config.repetition_penalty > 1) or (generation_config.presence_penalty != 0) or (generation_config.frequency_penalty != 0)
if do_rep_pen_hijack:
generation_config.repetition_penalty = 1.1 # Set to value > 1 to ensure RepetitionPenaltyLogitsProcessor is created
result = self._get_logits_processor_old(**kwargs)
if do_rep_pen_hijack:
for i in range(len(result)):
if result[i].__class__.__name__ == 'RepetitionPenaltyLogitsProcessor':
result[i] = RepetitionPenaltyLogitsProcessorWithRange(
generation_config.repetition_penalty,
generation_config.presence_penalty,
generation_config.frequency_penalty,
generation_config.repetition_penalty_range
)
if generation_config.dry_multiplier is not None and generation_config.dry_multiplier > 0.0:
dry_sequence_breakers = generation_config.dry_sequence_breakers
# Support both JSON array notation and comma-separated strings.
if not dry_sequence_breakers.startswith("["):
dry_sequence_breakers = "[" + dry_sequence_breakers + "]"
sequence_breaker_strings = json.loads(dry_sequence_breakers)
# Prefix with 'a' to get the correct encoding of the token at the end of a text.
sequence_breakers = {shared.tokenizer.encode(f'a{s}')[-1] for s in sequence_breaker_strings}
result.append(
DRYLogitsProcessor(
multiplier=generation_config.dry_multiplier,
base=generation_config.dry_base,
allowed_length=generation_config.dry_allowed_length,
sequence_breakers=sequence_breakers,
_range=generation_config.repetition_penalty_range,
)
)
return result
def generation_config_init_patch(self, **kwargs): def generation_config_init_patch(self, **kwargs):
self.__init___old(**kwargs) self.__init___old(**kwargs)
self.min_p = kwargs.pop("min_p", 0.0) self.min_p = kwargs.pop("min_p", 0.0)
@ -546,14 +643,13 @@ def generation_config_init_patch(self, **kwargs):
self.dry_base = kwargs.pop("dry_base", 1.75) self.dry_base = kwargs.pop("dry_base", 1.75)
self.dry_allowed_length = kwargs.pop("dry_allowed_length", 2) self.dry_allowed_length = kwargs.pop("dry_allowed_length", 2)
self.dry_sequence_breakers = kwargs.pop("dry_sequence_breakers", '"\\n", ":", "\\"", "*"') self.dry_sequence_breakers = kwargs.pop("dry_sequence_breakers", '"\\n", ":", "\\"", "*"')
self.xtc_threshold = kwargs.pop("xtc_threshold", 0.1)
self.xtc_probability = kwargs.pop("xtc_probability", 0)
self.temperature_last = kwargs.pop("temperature_last", False) self.temperature_last = kwargs.pop("temperature_last", False)
self.sampler_priority = kwargs.pop("sampler_priority", ['temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutoff', 'tfs', 'top_a', 'min_p', 'mirostat']) self.sampler_priority = kwargs.pop("sampler_priority", ['repetition_penalty', 'presence_penalty', 'frequency_penalty', 'dry', 'temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutoff', 'tfs', 'top_a', 'min_p', 'mirostat', 'xtc', 'encoder_repetition_penalty', 'no_repeat_ngram'])
def hijack_samplers(): def hijack_samplers():
transformers.GenerationMixin._get_logits_warper_old = transformers.GenerationMixin._get_logits_warper
transformers.GenerationMixin._get_logits_warper = get_logits_warper_patch
transformers.GenerationMixin._get_logits_processor_old = transformers.GenerationMixin._get_logits_processor transformers.GenerationMixin._get_logits_processor_old = transformers.GenerationMixin._get_logits_processor
transformers.GenerationMixin._get_logits_processor = get_logits_processor_patch transformers.GenerationMixin._get_logits_processor = get_logits_processor_patch

View File

@ -146,6 +146,7 @@ group.add_argument('--no_sdpa', action='store_true', help='Force Torch SDPA to n
group.add_argument('--cache_8bit', action='store_true', help='Use 8-bit cache to save VRAM.') group.add_argument('--cache_8bit', action='store_true', help='Use 8-bit cache to save VRAM.')
group.add_argument('--cache_4bit', action='store_true', help='Use Q4 cache to save VRAM.') group.add_argument('--cache_4bit', action='store_true', help='Use Q4 cache to save VRAM.')
group.add_argument('--num_experts_per_token', type=int, default=2, help='Number of experts to use for generation. Applies to MoE models like Mixtral.') group.add_argument('--num_experts_per_token', type=int, default=2, help='Number of experts to use for generation. Applies to MoE models like Mixtral.')
group.add_argument('--enable_tp', action='store_true', help='Enable Tensor Parallelism (TP) in ExLlamaV2.')
# AutoGPTQ # AutoGPTQ
group = parser.add_argument_group('AutoGPTQ') group = parser.add_argument_group('AutoGPTQ')

View File

@ -274,10 +274,10 @@ def get_reply_from_output_ids(output_ids, state=None, starting_from=0):
if (hasattr(shared.tokenizer, 'convert_ids_to_tokens') and len(output_ids) > starting_from) and not reply.startswith(' '): if (hasattr(shared.tokenizer, 'convert_ids_to_tokens') and len(output_ids) > starting_from) and not reply.startswith(' '):
first_token = shared.tokenizer.convert_ids_to_tokens(int(output_ids[starting_from])) first_token = shared.tokenizer.convert_ids_to_tokens(int(output_ids[starting_from]))
if isinstance(first_token, (bytes,)): if isinstance(first_token, (bytes,)):
#try to decode the bytes to a string # try to decode the bytes to a string
# if it fails, which means it's not a string in this turn, just ignore it
try: try:
first_token = first_token.decode('utf8') first_token = first_token.decode('utf8')
#if it fails, which means it's not a string in this turn, just ignore it
except UnicodeDecodeError: except UnicodeDecodeError:
first_token = '' first_token = ''
@ -289,7 +289,7 @@ def get_reply_from_output_ids(output_ids, state=None, starting_from=0):
def generate_reply_HF(question, original_question, seed, state, stopping_strings=None, is_chat=False): def generate_reply_HF(question, original_question, seed, state, stopping_strings=None, is_chat=False):
generate_params = {} generate_params = {}
for k in ['max_new_tokens', 'temperature', 'temperature_last', 'dynamic_temperature', 'dynatemp_low', 'dynatemp_high', 'dynatemp_exponent', 'smoothing_factor', 'smoothing_curve', 'top_p', 'min_p', 'top_k', 'repetition_penalty', 'presence_penalty', 'frequency_penalty', 'repetition_penalty_range', 'typical_p', 'tfs', 'top_a', 'guidance_scale', 'penalty_alpha', 'mirostat_mode', 'mirostat_tau', 'mirostat_eta', 'do_sample', 'encoder_repetition_penalty', 'no_repeat_ngram_size', 'dry_multiplier', 'dry_base', 'dry_allowed_length', 'dry_sequence_breakers']: for k in ['max_new_tokens', 'temperature', 'temperature_last', 'dynamic_temperature', 'dynatemp_low', 'dynatemp_high', 'dynatemp_exponent', 'smoothing_factor', 'smoothing_curve', 'top_p', 'min_p', 'top_k', 'repetition_penalty', 'presence_penalty', 'frequency_penalty', 'repetition_penalty_range', 'typical_p', 'tfs', 'top_a', 'guidance_scale', 'penalty_alpha', 'mirostat_mode', 'mirostat_tau', 'mirostat_eta', 'do_sample', 'encoder_repetition_penalty', 'no_repeat_ngram_size', 'dry_multiplier', 'dry_base', 'dry_allowed_length', 'dry_sequence_breakers', 'xtc_threshold', 'xtc_probability']:
if k in state: if k in state:
generate_params[k] = state[k] generate_params[k] = state[k]

View File

@ -90,6 +90,7 @@ def list_model_elements():
'cache_8bit', 'cache_8bit',
'cache_4bit', 'cache_4bit',
'autosplit', 'autosplit',
'enable_tp',
'threads', 'threads',
'threads_batch', 'threads_batch',
'n_batch', 'n_batch',
@ -158,6 +159,8 @@ def list_interface_input_elements():
'dry_base', 'dry_base',
'dry_allowed_length', 'dry_allowed_length',
'dry_sequence_breakers', 'dry_sequence_breakers',
'xtc_threshold',
'xtc_probability',
'do_sample', 'do_sample',
'penalty_alpha', 'penalty_alpha',
'mirostat_mode', 'mirostat_mode',

View File

@ -74,7 +74,7 @@ def handle_save_preset_confirm_click(filename, contents):
try: try:
utils.save_file(f"presets/{filename}.yaml", contents) utils.save_file(f"presets/{filename}.yaml", contents)
available_presets = utils.get_available_presets() available_presets = utils.get_available_presets()
output = gr.update(choices=available_presets, value=filename), output = gr.update(choices=available_presets, value=filename)
except Exception: except Exception:
output = gr.update() output = gr.update()
traceback.print_exc() traceback.print_exc()

View File

@ -136,6 +136,7 @@ def create_ui():
shared.gradio['disk'] = gr.Checkbox(label="disk", value=shared.args.disk) shared.gradio['disk'] = gr.Checkbox(label="disk", value=shared.args.disk)
shared.gradio['bf16'] = gr.Checkbox(label="bf16", value=shared.args.bf16) shared.gradio['bf16'] = gr.Checkbox(label="bf16", value=shared.args.bf16)
shared.gradio['autosplit'] = gr.Checkbox(label="autosplit", value=shared.args.autosplit, info='Automatically split the model tensors across the available GPUs.') shared.gradio['autosplit'] = gr.Checkbox(label="autosplit", value=shared.args.autosplit, info='Automatically split the model tensors across the available GPUs.')
shared.gradio['enable_tp'] = gr.Checkbox(label="enable_tp", value=shared.args.enable_tp, info='Enable Tensor Parallelism (TP).')
shared.gradio['no_flash_attn'] = gr.Checkbox(label="no_flash_attn", value=shared.args.no_flash_attn) shared.gradio['no_flash_attn'] = gr.Checkbox(label="no_flash_attn", value=shared.args.no_flash_attn)
shared.gradio['no_xformers'] = gr.Checkbox(label="no_xformers", value=shared.args.no_xformers) shared.gradio['no_xformers'] = gr.Checkbox(label="no_xformers", value=shared.args.no_xformers)
shared.gradio['no_sdpa'] = gr.Checkbox(label="no_sdpa", value=shared.args.no_sdpa) shared.gradio['no_sdpa'] = gr.Checkbox(label="no_sdpa", value=shared.args.no_sdpa)

View File

@ -45,6 +45,10 @@ def create_ui(default_preset):
shared.gradio['dry_base'] = gr.Slider(1, 4, value=generate_params['dry_base'], step=0.01, label='dry_base', info='Controls how fast the penalty grows with increasing sequence length.') shared.gradio['dry_base'] = gr.Slider(1, 4, value=generate_params['dry_base'], step=0.01, label='dry_base', info='Controls how fast the penalty grows with increasing sequence length.')
shared.gradio['dry_sequence_breakers'] = gr.Textbox(value=generate_params['dry_sequence_breakers'], label='dry_sequence_breakers', info='Tokens across which sequence matching is not continued. Specified as a comma-separated list of quoted strings.') shared.gradio['dry_sequence_breakers'] = gr.Textbox(value=generate_params['dry_sequence_breakers'], label='dry_sequence_breakers', info='Tokens across which sequence matching is not continued. Specified as a comma-separated list of quoted strings.')
with gr.Blocks():
shared.gradio['xtc_threshold'] = gr.Slider(0, 0.5, value=generate_params['xtc_threshold'], step=0.01, label='xtc_threshold', info='If 2 or more tokens have probability above this threshold, consider removing all but the last one.')
shared.gradio['xtc_probability'] = gr.Slider(0, 1, value=generate_params['xtc_probability'], step=0.01, label='xtc_probability', info='Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.')
gr.Markdown("[Learn more](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab)") gr.Markdown("[Learn more](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab)")
with gr.Column(): with gr.Column():

View File

@ -16,9 +16,9 @@ import sys
# Define the required PyTorch version # Define the required PyTorch version
TORCH_VERSION = "2.2.2" TORCH_VERSION = "2.4.1"
TORCHVISION_VERSION = "0.17.2" TORCHVISION_VERSION = "0.19.1"
TORCHAUDIO_VERSION = "2.2.2" TORCHAUDIO_VERSION = "2.4.1"
# Environment # Environment
script_dir = os.getcwd() script_dir = os.getcwd()
@ -117,7 +117,7 @@ def update_pytorch():
elif is_cuda: elif is_cuda:
install_pytorch += "--index-url https://download.pytorch.org/whl/cu121" install_pytorch += "--index-url https://download.pytorch.org/whl/cu121"
elif is_rocm: elif is_rocm:
install_pytorch += "--index-url https://download.pytorch.org/whl/rocm5.6" install_pytorch += "--index-url https://download.pytorch.org/whl/rocm6.1"
elif is_cpu: elif is_cpu:
install_pytorch += "--index-url https://download.pytorch.org/whl/cpu" install_pytorch += "--index-url https://download.pytorch.org/whl/cpu"
elif is_intel: elif is_intel:
@ -189,8 +189,11 @@ def run_cmd(cmd, assert_success=False, environment=False, capture_output=False,
conda_sh_path = os.path.join(script_dir, "installer_files", "conda", "etc", "profile.d", "conda.sh") conda_sh_path = os.path.join(script_dir, "installer_files", "conda", "etc", "profile.d", "conda.sh")
cmd = f'. "{conda_sh_path}" && conda activate "{conda_env_path}" && {cmd}' cmd = f'. "{conda_sh_path}" && conda activate "{conda_env_path}" && {cmd}'
# Set executable to None for Windows, bash for everything else
executable = None if is_windows() else 'bash'
# Run shell commands # Run shell commands
result = subprocess.run(cmd, shell=True, capture_output=capture_output, env=env) result = subprocess.run(cmd, shell=True, capture_output=capture_output, env=env, executable=executable)
# Assert the command ran successfully # Assert the command ran successfully
if assert_success and result.returncode != 0: if assert_success and result.returncode != 0:
@ -239,7 +242,7 @@ def install_webui():
"What is your GPU?", "What is your GPU?",
{ {
'A': 'NVIDIA', 'A': 'NVIDIA',
'B': 'AMD (Linux/MacOS only. Requires ROCm SDK 5.6 on Linux)', 'B': 'AMD (Linux/MacOS only. Requires ROCm SDK 6.1 on Linux)',
'C': 'Apple M Series', 'C': 'Apple M Series',
'D': 'Intel Arc (IPEX)', 'D': 'Intel Arc (IPEX)',
'N': 'None (I want to run models in CPU mode)' 'N': 'None (I want to run models in CPU mode)'
@ -294,7 +297,7 @@ def install_webui():
else: else:
install_pytorch += "--index-url https://download.pytorch.org/whl/cu121" install_pytorch += "--index-url https://download.pytorch.org/whl/cu121"
elif selected_gpu == "AMD": elif selected_gpu == "AMD":
install_pytorch += "--index-url https://download.pytorch.org/whl/rocm5.6" install_pytorch += "--index-url https://download.pytorch.org/whl/rocm6.1"
elif selected_gpu in ["APPLE", "NONE"]: elif selected_gpu in ["APPLE", "NONE"]:
install_pytorch += "--index-url https://download.pytorch.org/whl/cpu" install_pytorch += "--index-url https://download.pytorch.org/whl/cpu"
elif selected_gpu == "INTEL": elif selected_gpu == "INTEL":
@ -310,7 +313,7 @@ def install_webui():
if selected_gpu == "INTEL": if selected_gpu == "INTEL":
# Install oneAPI dependencies via conda # Install oneAPI dependencies via conda
print_big_message("Installing Intel oneAPI runtime libraries.") print_big_message("Installing Intel oneAPI runtime libraries.")
run_cmd("conda install -y -c intel dpcpp-cpp-rt=2024.0 mkl-dpcpp=2024.0") run_cmd("conda install -y -c https://software.repos.intel.com/python/conda/ -c conda-forge dpcpp-cpp-rt=2024.0 mkl-dpcpp=2024.0")
# Install libuv required by Intel-patched torch # Install libuv required by Intel-patched torch
run_cmd("conda install -y libuv") run_cmd("conda install -y libuv")
@ -326,7 +329,7 @@ def install_extensions_requirements():
print_big_message("Installing extensions requirements.\nSome of these may fail on Windows.\nDon\'t worry if you see error messages, as they will not affect the main program.") print_big_message("Installing extensions requirements.\nSome of these may fail on Windows.\nDon\'t worry if you see error messages, as they will not affect the main program.")
extensions = get_extensions_names() extensions = get_extensions_names()
for i, extension in enumerate(extensions): for i, extension in enumerate(extensions):
print(f"\n\n--- [{i+1}/{len(extensions)}]: {extension}\n\n") print(f"\n\n--- [{i + 1}/{len(extensions)}]: {extension}\n\n")
extension_req_path = os.path.join("extensions", extension, "requirements.txt") extension_req_path = os.path.join("extensions", extension, "requirements.txt")
run_cmd(f"python -m pip install -r {extension_req_path} --upgrade", assert_success=False, environment=True) run_cmd(f"python -m pip install -r {extension_req_path} --upgrade", assert_success=False, environment=True)

View File

@ -1,13 +1,10 @@
accelerate==0.33.* accelerate==0.33.*
aqlm[gpu,cpu]==1.1.6; platform_system == "Linux" bitsandbytes==0.44.*
auto-gptq==0.7.1
bitsandbytes==0.43.*
colorama colorama
datasets datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -26,7 +23,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -39,38 +36,30 @@ soundfile
openai-whisper openai-whisper
# llama-cpp-python (CPU only, AVX2) # llama-cpp-python (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
# llama-cpp-python (CUDA, no tensor cores) # llama-cpp-python (CUDA, no tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
# llama-cpp-python (CUDA, tensor cores) # llama-cpp-python (CUDA, tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.2.2cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.4.1cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.2.2cxx11abiFALSE-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.4.1cxx11abiFALSE-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -34,18 +33,14 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama-cpp-python (CPU only, AVX2) # llama-cpp-python (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
# AMD wheels # AMD wheels
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/rocm/llama_cpp_python_cuda-0.2.90+rocm5.6.1-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/rocm/llama_cpp_python_cuda-0.3.1+rocm6.1.2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/rocm/llama_cpp_python_cuda-0.2.90+rocm5.6.1-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/rocm/llama_cpp_python_cuda-0.3.1+rocm6.1.2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+rocm5.6.torch2.2.2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+rocm6.1.torch2.4.1-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+rocm5.6.torch2.2.2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+rocm6.1.torch2.4.1-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7+rocm561-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7+rocm561-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -34,16 +33,12 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama-cpp-python (CPU only, no AVX2) # llama-cpp-python (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
# AMD wheels # AMD wheels
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+rocm5.6.torch2.2.2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+rocm6.1.torch2.4.1-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+rocm5.6.torch2.2.2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+rocm6.1.torch2.4.1-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7+rocm561-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7+rocm561-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -34,8 +33,8 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp311-cp311-macosx_12_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp311-cp311-macosx_12_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp310-cp310-macosx_12_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp310-cp310-macosx_12_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp311-cp311-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp311-cp311-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp310-cp310-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp310-cp310-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0-py3-none-any.whl https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3-py3-none-any.whl

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -34,10 +33,10 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp311-cp311-macosx_12_0_arm64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp311-cp311-macosx_12_0_arm64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp310-cp310-macosx_12_0_arm64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp310-cp310-macosx_12_0_arm64.whl; platform_system == "Darwin" and platform_release >= "21.0.0" and platform_release < "22.0.0" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp311-cp311-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp311-cp311-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp310-cp310-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp310-cp310-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp311-cp311-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp311-cp311-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.90-cp310-cp310-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.3.1-cp310-cp310-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0-py3-none-any.whl https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3-py3-none-any.whl

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -34,7 +33,7 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama-cpp-python (CPU only, AVX2) # llama-cpp-python (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -34,7 +33,7 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama-cpp-python (CPU only, no AVX2) # llama-cpp-python (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"

View File

@ -1,13 +1,10 @@
accelerate==0.33.* accelerate==0.33.*
aqlm[gpu,cpu]==1.1.6; platform_system == "Linux" bitsandbytes==0.44.*
auto-gptq==0.7.1
bitsandbytes==0.43.*
colorama colorama
datasets datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -26,7 +23,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb
@ -37,38 +34,30 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama-cpp-python (CPU only, no AVX2) # llama-cpp-python (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.90+cpuavx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.3.1+cpuavx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
# llama-cpp-python (CUDA, no tensor cores) # llama-cpp-python (CUDA, no tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121avx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121avx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121avx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121avx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121avx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121avx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.90+cu121avx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.3.1+cu121avx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
# llama-cpp-python (CUDA, tensor cores) # llama-cpp-python (CUDA, tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121avx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121avx-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121avx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121avx-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121avx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121avx-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.90+cu121avx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.3.1+cu121avx-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0+cu121.torch2.2.2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3+cu121.torch2.4.1-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.2.0/exllamav2-0.2.0-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64" https://github.com/oobabooga/exllamav2/releases/download/v0.2.3/exllamav2-0.2.3-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.2.2cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.4.1cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.2.2cxx11abiFALSE-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10" https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.4.1cxx11abiFALSE-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ/releases/download/0.2.6/autoawq-0.2.6-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/AutoAWQ_kernels/releases/download/0.0.7/autoawq_kernels-0.0.7-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

View File

@ -4,7 +4,6 @@ datasets
einops einops
fastapi==0.112.4 fastapi==0.112.4
gradio==4.26.* gradio==4.26.*
hqq==0.1.7.post3
jinja2==3.1.4 jinja2==3.1.4
lm_eval==0.3.0 lm_eval==0.3.0
markdown markdown
@ -23,7 +22,7 @@ safetensors==0.4.*
scipy scipy
sentencepiece sentencepiece
tensorboard tensorboard
transformers==4.44.* transformers==4.45.*
tqdm tqdm
wandb wandb