catalpaaa
9e2963e0c8
Update server.py
2023-03-24 17:35:45 -07:00
catalpaaa
ec2a1facee
Update server.py
2023-03-24 17:34:33 -07:00
catalpaaa
b37c54edcf
lora-dir, model-dir and login auth
...
Added lora-dir, model-dir, and a login auth arguments that points to a file contains usernames and passwords in the format of "u:pw,u:pw,..."
2023-03-24 17:30:18 -07:00
oobabooga
9fa47c0eed
Revert GPTQ_loader.py (accident)
2023-03-24 19:57:12 -03:00
oobabooga
a6bf54739c
Revert models.py (accident)
2023-03-24 19:56:45 -03:00
oobabooga
0a16224451
Update GPTQ_loader.py
2023-03-24 19:54:36 -03:00
oobabooga
a80aa65986
Update models.py
2023-03-24 19:53:20 -03:00
oobabooga
507db0929d
Do not use empty user messages in chat mode
...
This allows the bot to send messages by clicking on Generate with empty inputs.
2023-03-24 17:22:22 -03:00
oobabooga
6e1b16c2aa
Update html_generator.py
2023-03-24 17:18:27 -03:00
oobabooga
ffb0187e83
Update chat.py
2023-03-24 17:17:29 -03:00
oobabooga
c14e598f14
Merge pull request #433 from mayaeary/fix/api-reload
...
Fix api extension duplicating
2023-03-24 16:56:10 -03:00
oobabooga
bfe960731f
Merge branch 'main' into fix/api-reload
2023-03-24 16:54:41 -03:00
oobabooga
4a724ed22f
Reorder imports
2023-03-24 16:53:56 -03:00
oobabooga
8fad84abc2
Update extensions.py
2023-03-24 16:51:27 -03:00
oobabooga
d8e950d6bd
Don't load the model twice when using --lora
2023-03-24 16:30:32 -03:00
oobabooga
fd99995b01
Make the Stop button more consistent in chat mode
2023-03-24 15:59:27 -03:00
Forkoz
b740c5b284
Add display of context when input was generated
...
Not sure if I did this right but it does move with the conversation and seems to match value.
2023-03-24 08:56:07 -05:00
oobabooga
4f5c2ce785
Fix chat_generation_attempts
2023-03-24 02:03:30 -03:00
oobabooga
04417b658b
Update README.md
2023-03-24 01:40:43 -03:00
oobabooga
bb4cb22453
Download .pt files using download-model.py (for 4-bit models)
2023-03-24 00:49:04 -03:00
oobabooga
143b5b5edf
Mention one-click-bandaid in the README
2023-03-23 23:28:50 -03:00
EyeDeck
dcfd866402
Allow loading of .safetensors through GPTQ-for-LLaMa
2023-03-23 21:31:34 -04:00
oobabooga
8747c74339
Another missing import
2023-03-23 22:19:01 -03:00
oobabooga
7078d168c3
Missing import
2023-03-23 22:16:08 -03:00
oobabooga
d1327f99f9
Fix broken callbacks.py
2023-03-23 22:12:24 -03:00
oobabooga
9bdb3c784d
Minor fix
2023-03-23 22:02:40 -03:00
oobabooga
b0abb327d8
Update LoRA.py
2023-03-23 22:02:09 -03:00
oobabooga
bf22d16ebc
Clear cache while switching LoRAs
2023-03-23 21:56:26 -03:00
oobabooga
4578e88ffd
Stop the bot from talking for you in chat mode
2023-03-23 21:38:20 -03:00
oobabooga
9bf6ecf9e2
Fix LoRA device map (attempt)
2023-03-23 16:49:41 -03:00
oobabooga
c5ebcc5f7e
Change the default names ( #518 )
...
* Update shared.py
* Update settings-template.json
2023-03-23 13:36:00 -03:00
Φφ
483d173d23
Code reuse + indication
...
Now shows the message in the console when unloading weights. Also reload_model() calls unload_model() first to free the memory so that multiple reloads won't overfill it.
2023-03-23 07:06:26 +03:00
Φφ
1917b15275
Unload and reload models on request
2023-03-23 07:06:26 +03:00
oobabooga
29bd41d453
Fix LoRA in CPU mode
2023-03-23 01:05:13 -03:00
oobabooga
eac27f4f55
Make LoRAs work in 16-bit mode
2023-03-23 00:55:33 -03:00
oobabooga
bfa81e105e
Fix FlexGen streaming
2023-03-23 00:22:14 -03:00
oobabooga
7b6f85d327
Fix markdown headers in light mode
2023-03-23 00:13:34 -03:00
oobabooga
de6a09dc7f
Properly separate the original prompt from the reply
2023-03-23 00:12:40 -03:00
oobabooga
d5fc1bead7
Merge pull request #489 from Brawlence/ext-fixes
...
Extensions performance & memory optimisations
2023-03-22 16:10:59 -03:00
oobabooga
bfb1be2820
Minor fix
2023-03-22 16:09:48 -03:00
oobabooga
0abff499e2
Use image.thumbnail
2023-03-22 16:03:05 -03:00
oobabooga
104212529f
Minor changes
2023-03-22 15:55:03 -03:00
wywywywy
61346b88ea
Add "seed" menu in the Parameters tab
2023-03-22 15:40:20 -03:00
Φφ
5389fce8e1
Extensions performance & memory optimisations
...
Reworked remove_surrounded_chars() to use regular expression ( https://regexr.com/7alb5 ) instead of repeated string concatenations for elevenlab_tts, silero_tts, sd_api_pictures. This should be both faster and more robust in handling asterisks.
Reduced the memory footprint of send_pictures and sd_api_pictures by scaling the images in the chat to 300 pixels max-side wise. (The user already has the original in case of the sent picture and there's an option to save the SD generation).
This should fix history growing annoyingly large with multiple pictures present
2023-03-22 11:51:00 +03:00
oobabooga
45b7e53565
Only catch proper Exceptions in the text generation function
2023-03-20 20:36:02 -03:00
oobabooga
6872ffd976
Update README.md
2023-03-20 16:53:14 -03:00
oobabooga
db4219a340
Update comments
2023-03-20 16:40:08 -03:00
oobabooga
7618f3fe8c
Add -gptq-preload for 4-bit offloading ( #460 )
...
This works in a 4GB card now:
```
python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20
```
2023-03-20 16:30:56 -03:00
Vladimir Belitskiy
e96687b1d6
Do not send empty user input as part of the prompt.
...
However, if extensions modify the empty prompt to be non-empty,
it'l still work as before.
2023-03-20 14:27:39 -04:00
oobabooga
9a3bed50c3
Attempt at fixing 4-bit with CPU offload
2023-03-20 15:11:56 -03:00