Commit Graph

213 Commits

Author SHA1 Message Date
draff
804486214b Re-implement --load-in-4bit and update --llama-bits arg description 2023-03-10 23:21:01 +00:00
ItsLogic
9ba8156a70
remove unnecessary Path() 2023-03-10 22:33:58 +00:00
draff
e6c631aea4 Replace --load-in-4bit with --llama-bits
Replaces --load-in-4bit with a more flexible --llama-bits arg to allow for 2 and 3 bit models as well. This commit also fixes a loading issue with .pt files which are not in the root of the models folder
2023-03-10 21:36:45 +00:00
oobabooga
026d60bd34 Remove default preset that didn't do anything 2023-03-10 14:01:02 -03:00
oobabooga
e9dbdafb14
Merge branch 'main' into pt-path-changes 2023-03-10 11:03:42 -03:00
oobabooga
706a03b2cb Minor changes 2023-03-10 11:02:25 -03:00
oobabooga
de7dd8b6aa Add comments 2023-03-10 10:54:08 -03:00
oobabooga
e461c0b7a0 Move the import to the top 2023-03-10 10:51:12 -03:00
deepdiffuser
9fbd60bf22 add no_split_module_classes to prevent tensor split error 2023-03-10 05:30:47 -08:00
deepdiffuser
ab47044459 add multi-gpu support for 4bit gptq LLaMA 2023-03-10 04:52:45 -08:00
rohvani
2ac2913747 fix reference issue 2023-03-09 20:13:23 -08:00
rohvani
826e297b0e add llama-65b-4bit support & multiple pt paths 2023-03-09 18:31:32 -08:00
oobabooga
9849aac0f1 Don't show .pt models in the list 2023-03-09 21:54:50 -03:00
oobabooga
74102d5ee4 Insert to the path instead of appending 2023-03-09 20:51:22 -03:00
oobabooga
2965aa1625 Check if the .pt file exists 2023-03-09 20:48:51 -03:00
oobabooga
828a524f9a Add LLaMA 4-bit support 2023-03-09 15:50:26 -03:00
oobabooga
59b5f7a4b7 Improve usage of stopping_criteria 2023-03-08 12:13:40 -03:00
oobabooga
add9330e5e Bug fixes 2023-03-08 11:26:29 -03:00
Xan
5648a41a27 Merge branch 'main' of https://github.com/xanthousm/text-generation-webui 2023-03-08 22:08:54 +11:00
Xan
ad6b699503 Better TTS with autoplay
- Adds "still_streaming" to shared module for extensions to know if generation is complete
- Changed TTS extension with new options:
   - Show text under the audio widget
   - Automatically play the audio once text generation finishes
   - manage the generated wav files (only keep files for finished generations, optional max file limit)
   - [wip] ability to change voice pitch and speed
- added 'tensorboard' to requirements, since python sent "tensorboard not found" errors after a fresh installation.
2023-03-08 22:02:17 +11:00
oobabooga
33fb6aed74 Minor bug fix 2023-03-08 03:08:16 -03:00
oobabooga
ad2970374a Readability improvements 2023-03-08 03:00:06 -03:00
oobabooga
72d539dbff Better separate the FlexGen case 2023-03-08 02:54:47 -03:00
oobabooga
0e16c0bacb Remove redeclaration of a function 2023-03-08 02:50:49 -03:00
oobabooga
ab50f80542 New text streaming method (much faster) 2023-03-08 02:46:35 -03:00
oobabooga
8e89bc596b Fix encode() for RWKV 2023-03-07 23:15:46 -03:00
oobabooga
19a34941ed Add proper streaming to RWKV 2023-03-07 18:17:56 -03:00
oobabooga
8660227e1b Add top_k to RWKV 2023-03-07 17:24:28 -03:00
oobabooga
153dfeb4dd Add --rwkv-cuda-on parameter, bump rwkv version 2023-03-06 20:12:54 -03:00
oobabooga
6904a507c6 Change some parameters 2023-03-06 16:29:43 -03:00
oobabooga
20bd645f6a Fix bug in multigpu setups (attempt 3) 2023-03-06 15:58:18 -03:00
oobabooga
09a7c36e1b Minor improvement while running custom models 2023-03-06 15:36:35 -03:00
oobabooga
24c4c20391 Fix bug in multigpu setups (attempt #2) 2023-03-06 15:23:29 -03:00
oobabooga
d88b7836c6 Fix bug in multigpu setups 2023-03-06 14:58:30 -03:00
oobabooga
5bed607b77 Increase repetition frequency/penalty for RWKV 2023-03-06 14:25:48 -03:00
oobabooga
bf56b6c1fb Load settings.json without the need for --settings settings.json
This is for setting UI defaults
2023-03-06 10:57:45 -03:00
oobabooga
e91f4bc25a Add RWKV tokenizer 2023-03-06 08:45:49 -03:00
oobabooga
c855b828fe Better handle <USER> 2023-03-05 17:01:47 -03:00
oobabooga
2af66a4d4c Fix <USER> in pygmalion replies 2023-03-05 16:08:50 -03:00
oobabooga
a54b91af77 Improve readability 2023-03-05 10:21:15 -03:00
oobabooga
8e706df20e Fix a memory leak when text streaming is on 2023-03-05 10:12:43 -03:00
oobabooga
c33715ad5b Move towards HF LLaMA implementation 2023-03-05 01:20:31 -03:00
oobabooga
bd8aac8fa4 Add LLaMA 8-bit support 2023-03-04 13:28:42 -03:00
oobabooga
c93f1fa99b Count the tokens more conservatively 2023-03-04 03:10:21 -03:00
oobabooga
ed8b35efd2 Add --pin-weight parameter for FlexGen 2023-03-04 01:04:02 -03:00
oobabooga
05e703b4a4 Print the performance information more reliably 2023-03-03 21:24:32 -03:00
oobabooga
5a79863df3 Increase the sequence length, decrease batch size
I have no idea what I am doing
2023-03-03 15:54:13 -03:00
oobabooga
a345a2acd2 Add a tokenizer placeholder 2023-03-03 15:16:55 -03:00
oobabooga
5b354817f6 Make chat minimally work with LLaMA 2023-03-03 15:04:41 -03:00
oobabooga
ea5c5eb3da Add LLaMA support 2023-03-03 14:39:14 -03:00
oobabooga
2bff646130 Stop chat from flashing dark when processing 2023-03-03 13:19:13 -03:00
oobabooga
169209805d Model-aware prompts and presets 2023-03-02 11:25:04 -03:00
oobabooga
7bbe32f618 Don't return a value in an iterator function 2023-03-02 00:48:46 -03:00
oobabooga
ff9f649c0c Remove some unused imports 2023-03-02 00:36:20 -03:00
oobabooga
1a05860ca3 Ensure proper no-streaming with generation_attempts > 1 2023-03-02 00:10:10 -03:00
oobabooga
a2a3e8f797 Add --rwkv-strategy parameter 2023-03-01 20:02:48 -03:00
oobabooga
449116a510 Fix RWKV paths on Windows (attempt) 2023-03-01 19:17:16 -03:00
oobabooga
955cf431e8 Minor consistency fix 2023-03-01 19:11:26 -03:00
oobabooga
f3da6dcc8f
Merge pull request #149 from oobabooga/RWKV
Add RWKV support
2023-03-01 16:57:45 -03:00
oobabooga
831ac7ed3f Add top_p 2023-03-01 16:45:48 -03:00
oobabooga
7c4d5ca8cc Improve the text generation call a bit 2023-03-01 16:40:25 -03:00
oobabooga
2f16ce309a Rename a variable 2023-03-01 12:33:09 -03:00
oobabooga
9e9cfc4b31 Parameters 2023-03-01 12:19:37 -03:00
oobabooga
0f6708c471 Sort the imports 2023-03-01 12:18:17 -03:00
oobabooga
e735806c51 Add a generate() function for RWKV 2023-03-01 12:16:11 -03:00
oobabooga
659bb76722 Add RWKVModel class 2023-03-01 12:08:55 -03:00
oobabooga
9c86a1cd4a Add RWKV pip package 2023-03-01 11:42:49 -03:00
oobabooga
6837d4d72a Load the model by name 2023-02-28 02:52:29 -03:00
oobabooga
a1429d1607 Add default extensions to the settings 2023-02-28 02:20:11 -03:00
oobabooga
19ccb2aaf5 Handle <USER> and <BOT> 2023-02-28 01:05:43 -03:00
oobabooga
626da6c731 Handle {{user}} and {{char}} in example dialogue 2023-02-28 00:59:05 -03:00
oobabooga
e861e68e38 Move the chat example dialogue to the prompt 2023-02-28 00:50:46 -03:00
oobabooga
f871971de1 Trying to get the chat to work 2023-02-28 00:25:30 -03:00
oobabooga
67ee7bead7 Add cpu, bf16 options 2023-02-28 00:09:11 -03:00
oobabooga
ebd698905c Add streaming to RWKV 2023-02-28 00:04:04 -03:00
oobabooga
70e522732c Move RWKV loader into a separate file 2023-02-27 23:50:16 -03:00
oobabooga
ebc64a408c RWKV support prototype 2023-02-27 23:03:35 -03:00
oobabooga
021bd55886 Better format the prompt when generation attempts > 1 2023-02-27 21:37:03 -03:00
oobabooga
43b6ab8673 Store thumbnails as files instead of base64 strings
This improves the UI responsiveness for large histories.
2023-02-27 13:41:00 -03:00
oobabooga
f24b6e78a3 Fix clear history 2023-02-26 23:58:04 -03:00
oobabooga
8e3e8a070f Make FlexGen work with the newest API 2023-02-26 16:53:41 -03:00
oobabooga
3333f94c30 Make the gallery extension work on colab 2023-02-26 12:37:26 -03:00
oobabooga
633a2b6be2 Don't regenerate/remove last message if the chat is empty 2023-02-26 00:43:12 -03:00
oobabooga
6e843a11d6 Fix FlexGen in chat mode 2023-02-26 00:36:04 -03:00
oobabooga
4548227fb5 Downgrade gradio version (file uploads are broken in 3.19.1) 2023-02-25 22:59:02 -03:00
oobabooga
9456c1d6ed Prevent streaming with no_stream + generation attempts > 1 2023-02-25 17:45:03 -03:00
oobabooga
32f40f3b42 Bump gradio version to 3.19.1 2023-02-25 17:20:03 -03:00
oobabooga
fa58fd5559 Proper way to free the cuda cache 2023-02-25 15:50:29 -03:00
oobabooga
b585e382c0 Rename the custom prompt generator function 2023-02-25 15:13:14 -03:00
oobabooga
700311ce40 Empty the cuda cache at model.generate() 2023-02-25 14:39:13 -03:00
oobabooga
1878acd9f3 Minor bug fix in chat 2023-02-25 09:30:59 -03:00
oobabooga
e71ff959f5 Clean up some unused code 2023-02-25 09:23:02 -03:00
oobabooga
91f5852245 Move bot_picture.py inside the extension 2023-02-25 03:00:19 -03:00
oobabooga
5ac24b019e Minor fix in the extensions implementation 2023-02-25 02:53:18 -03:00
oobabooga
85f914b9b9 Disable the hijack after using it 2023-02-25 02:36:01 -03:00
oobabooga
7e9f13e29f Rename a variable 2023-02-25 01:55:32 -03:00
oobabooga
1741c36092 Minor fix 2023-02-25 01:47:25 -03:00
oobabooga
7c2babfe39 Rename greed to "generation attempts" 2023-02-25 01:42:19 -03:00
oobabooga
2dfb999bf1 Add greed parameter 2023-02-25 01:31:01 -03:00
oobabooga
13f2688134 Better way to generate custom prompts 2023-02-25 01:08:17 -03:00