oobabooga
|
9d6f79db74
|
Revert "Bump llama-cpp-python to 0.2.18 (#4611)"
This reverts commit 923c8e25fb .
|
2023-11-17 05:14:25 -08:00 |
|
oobabooga
|
13dc3b61da
|
Update README
|
2023-11-16 19:57:55 -08:00 |
|
oobabooga
|
8b66d83aa9
|
Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
|
2023-11-16 19:55:28 -08:00 |
|
oobabooga
|
923c8e25fb
|
Bump llama-cpp-python to 0.2.18 (#4611)
|
2023-11-16 22:55:14 -03:00 |
|
oobabooga
|
4aabff3728
|
Remove old API, launch OpenAI API with --api
|
2023-11-10 06:39:08 -08:00 |
|
oobabooga
|
6e2e0317af
|
Separate context and system message in instruction formats (#4499)
|
2023-11-07 20:02:58 -03:00 |
|
oobabooga
|
af3d25a503
|
Disable logits_all in llamacpp_HF (makes processing 3x faster)
|
2023-11-07 14:35:48 -08:00 |
|
oobabooga
|
ec17a5d2b7
|
Make OpenAI API the default API (#4430)
|
2023-11-06 02:38:29 -03:00 |
|
feng lui
|
4766a57352
|
transformers: add use_flash_attention_2 option (#4373)
|
2023-11-04 13:59:33 -03:00 |
|
Julien Chaumond
|
fdcaa955e3
|
transformers: Add a flag to force load from safetensors (#4450)
|
2023-11-02 16:20:54 -03:00 |
|
oobabooga
|
c0655475ae
|
Add cache_8bit option
|
2023-11-02 11:23:04 -07:00 |
|
oobabooga
|
77abd9b69b
|
Add no_flash_attn option
|
2023-11-02 11:08:53 -07:00 |
|
oobabooga
|
1edf321362
|
Lint
|
2023-10-23 13:09:03 -07:00 |
|
oobabooga
|
df90d03e0b
|
Replace --mul_mat_q with --no_mul_mat_q
|
2023-10-22 12:23:03 -07:00 |
|
oobabooga
|
2d1b3332e4
|
Ignore warnings on Colab
|
2023-10-21 21:45:25 -07:00 |
|
oobabooga
|
506d05aede
|
Organize command-line arguments
|
2023-10-21 18:52:59 -07:00 |
|
cal066
|
cc632c3f33
|
AutoAWQ: initial support (#3999)
|
2023-10-05 13:19:18 -03:00 |
|
oobabooga
|
b6fe6acf88
|
Add threads_batch parameter
|
2023-10-01 21:28:00 -07:00 |
|
jllllll
|
41a2de96e5
|
Bump llama-cpp-python to 0.2.11
|
2023-10-01 18:08:10 -05:00 |
|
oobabooga
|
f931184b53
|
Increase truncation limits to 32768
|
2023-09-28 19:28:22 -07:00 |
|
StoyanStAtanasov
|
7e6ff8d1f0
|
Enable NUMA feature for llama_cpp_python (#4040)
|
2023-09-26 22:05:00 -03:00 |
|
oobabooga
|
1ca54faaf0
|
Improve --multi-user mode
|
2023-09-26 06:42:33 -07:00 |
|
oobabooga
|
7f1460af29
|
Change a warning
|
2023-09-25 20:22:27 -07:00 |
|
oobabooga
|
d0d221df49
|
Add --use_fast option (closes #3741)
|
2023-09-25 12:19:43 -07:00 |
|
oobabooga
|
00ab450c13
|
Multiple histories for each character (#4022)
|
2023-09-21 17:19:32 -03:00 |
|
oobabooga
|
5075087461
|
Fix command-line arguments being ignored
|
2023-09-19 13:11:46 -07:00 |
|
missionfloyd
|
2ad6ca8874
|
Add back chat buttons with --chat-buttons (#3947)
|
2023-09-16 00:39:37 -03:00 |
|
saltacc
|
f01b9aa71f
|
Add customizable ban tokens (#3899)
|
2023-09-15 18:27:27 -03:00 |
|
oobabooga
|
3d1c0f173d
|
User config precedence over GGUF metadata
|
2023-09-14 12:15:52 -07:00 |
|
oobabooga
|
2f935547c8
|
Minor changes
|
2023-09-12 15:05:21 -07:00 |
|
oobabooga
|
c2a309f56e
|
Add ExLlamaV2 and ExLlamav2_HF loaders (#3881)
|
2023-09-12 14:33:07 -03:00 |
|
oobabooga
|
dae428a967
|
Revamp cai-chat theme, make it default
|
2023-09-11 19:30:40 -07:00 |
|
oobabooga
|
ed86878f02
|
Remove GGML support
|
2023-09-11 07:44:00 -07:00 |
|
oobabooga
|
cec8db52e5
|
Add max_tokens_second param (#3533)
|
2023-08-29 17:44:31 -03:00 |
|
oobabooga
|
36864cb3e8
|
Use Alpaca as the default instruction template
|
2023-08-29 13:06:25 -07:00 |
|
Cebtenzzre
|
2f5d769a8d
|
accept floating-point alpha value on the command line (#3712)
|
2023-08-27 18:54:43 -03:00 |
|
oobabooga
|
f4f04c8c32
|
Fix a typo
|
2023-08-25 07:08:38 -07:00 |
|
oobabooga
|
52ab2a6b9e
|
Add rope_freq_base parameter for CodeLlama
|
2023-08-25 06:55:15 -07:00 |
|
oobabooga
|
d6934bc7bc
|
Implement CFG for ExLlama_HF (#3666)
|
2023-08-24 16:27:36 -03:00 |
|
oobabooga
|
7cba000421
|
Bump llama-cpp-python, +tensor_split by @shouyiwang, +mul_mat_q (#3610)
|
2023-08-18 12:03:34 -03:00 |
|
oobabooga
|
73d9befb65
|
Make "Show controls" customizable through settings.yaml
|
2023-08-16 07:04:18 -07:00 |
|
oobabooga
|
ccfc02a28d
|
Add the --disable_exllama option for AutoGPTQ (#3545 from clefever/disable-exllama)
|
2023-08-14 15:15:55 -03:00 |
|
oobabooga
|
d8a82d34ed
|
Improve a warning
|
2023-08-14 08:46:05 -07:00 |
|
oobabooga
|
619cb4e78b
|
Add "save defaults to settings.yaml" button (#3574)
|
2023-08-14 11:46:07 -03:00 |
|
oobabooga
|
a1a9ec895d
|
Unify the 3 interface modes (#3554)
|
2023-08-13 01:12:15 -03:00 |
|
Chris Lefever
|
0230fa4e9c
|
Add the --disable_exllama option for AutoGPTQ
|
2023-08-12 02:26:58 -04:00 |
|
cal066
|
7a4fcee069
|
Add ctransformers support (#3313)
---------
Co-authored-by: cal066 <cal066@users.noreply.github.com>
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
Co-authored-by: randoentity <137087500+randoentity@users.noreply.github.com>
|
2023-08-11 14:41:33 -03:00 |
|
jllllll
|
bee73cedbd
|
Streamline GPTQ-for-LLaMa support
|
2023-08-09 23:42:34 -05:00 |
|
oobabooga
|
d8fb506aff
|
Add RoPE scaling support for transformers (including dynamic NTK)
https://github.com/huggingface/transformers/pull/24653
|
2023-08-08 21:25:48 -07:00 |
|
Friedemann Lipphardt
|
901b028d55
|
Add option for named cloudflare tunnels (#3364)
|
2023-08-08 22:20:27 -03:00 |
|
oobabooga
|
a373c96d59
|
Fix a bug in modules/shared.py
|
2023-08-06 20:36:35 -07:00 |
|
oobabooga
|
3d48933f27
|
Remove ancient deprecation warnings
|
2023-08-06 18:58:59 -07:00 |
|
oobabooga
|
0af10ab49b
|
Add Classifier Free Guidance (CFG) for Transformers/ExLlama (#3325)
|
2023-08-06 17:22:48 -03:00 |
|
oobabooga
|
8df3cdfd51
|
Add SSL certificate support (#3453)
|
2023-08-04 13:57:31 -03:00 |
|
oobabooga
|
87dab03dc0
|
Add the --cpu option for llama.cpp to prevent CUDA from being used (#3432)
|
2023-08-03 11:00:36 -03:00 |
|
oobabooga
|
32c564509e
|
Fix loading session in chat mode
|
2023-08-02 21:13:16 -07:00 |
|
oobabooga
|
e931844fe2
|
Add auto_max_new_tokens parameter (#3419)
|
2023-08-02 14:52:20 -03:00 |
|
oobabooga
|
8d46a8c50a
|
Change the default chat style and the default preset
|
2023-08-01 09:35:17 -07:00 |
|
oobabooga
|
b17893a58f
|
Revert "Add tensor split support for llama.cpp (#3171)"
This reverts commit 031fe7225e .
|
2023-07-26 07:06:01 -07:00 |
|
oobabooga
|
28779cd959
|
Use dark theme by default
|
2023-07-25 20:11:57 -07:00 |
|
oobabooga
|
77d2e9f060
|
Remove flexgen 2
|
2023-07-25 15:18:25 -07:00 |
|
oobabooga
|
75c2dd38cf
|
Remove flexgen support
|
2023-07-25 15:15:29 -07:00 |
|
Shouyi
|
031fe7225e
|
Add tensor split support for llama.cpp (#3171)
|
2023-07-25 18:59:26 -03:00 |
|
Eve
|
f653546484
|
README updates and improvements (#3198)
|
2023-07-25 18:58:13 -03:00 |
|
oobabooga
|
a07d070b6c
|
Add llama-2-70b GGML support (#3285)
|
2023-07-24 16:37:03 -03:00 |
|
oobabooga
|
913e060348
|
Change the default preset to Divine Intellect
It seems to reduce hallucination while using instruction-tuned models.
|
2023-07-19 08:24:37 -07:00 |
|
oobabooga
|
8c1c2e0fae
|
Increase max_new_tokens upper limit
|
2023-07-17 17:08:22 -07:00 |
|
oobabooga
|
b1a6ea68dd
|
Disable "autoload the model" by default
|
2023-07-17 07:40:56 -07:00 |
|
oobabooga
|
5e3f7e00a9
|
Create llamacpp_HF loader (#3062)
|
2023-07-16 02:21:13 -03:00 |
|
oobabooga
|
e202190c4f
|
lint
|
2023-07-12 11:33:25 -07:00 |
|
FartyPants
|
9b55d3a9f9
|
More robust and error prone training (#3058)
|
2023-07-12 15:29:43 -03:00 |
|
Gabriel Pena
|
eedb3bf023
|
Add low vram mode on llama cpp (#3076)
|
2023-07-12 11:05:13 -03:00 |
|
Panchovix
|
10c8c197bf
|
Add Support for Static NTK RoPE scaling for exllama/exllama_hf (#2955)
|
2023-07-04 01:13:16 -03:00 |
|
oobabooga
|
4b1804a438
|
Implement sessions + add basic multi-user support (#2991)
|
2023-07-04 00:03:30 -03:00 |
|
oobabooga
|
c52290de50
|
ExLlama with long context (#2875)
|
2023-06-25 22:49:26 -03:00 |
|
oobabooga
|
3ae9af01aa
|
Add --no_use_cuda_fp16 param for AutoGPTQ
|
2023-06-23 12:22:56 -03:00 |
|
oobabooga
|
383c50f05b
|
Replace old presets with the results of Preset Arena (#2830)
|
2023-06-23 01:48:29 -03:00 |
|
LarryVRH
|
580c1ee748
|
Implement a demo HF wrapper for exllama to utilize existing HF transformers decoding. (#2777)
|
2023-06-21 15:31:42 -03:00 |
|
oobabooga
|
e19cbea719
|
Add a variable to modules/shared.py
|
2023-06-17 19:02:29 -03:00 |
|
oobabooga
|
5f392122fd
|
Add gpu_split param to ExLlama
Adapted from code created by Ph0rk0z. Thank you Ph0rk0z.
|
2023-06-16 20:49:36 -03:00 |
|
oobabooga
|
9f40032d32
|
Add ExLlama support (#2444)
|
2023-06-16 20:35:38 -03:00 |
|
oobabooga
|
7ef6a50e84
|
Reorganize model loading UI completely (#2720)
|
2023-06-16 19:00:37 -03:00 |
|
Tom Jobbins
|
646b0c889f
|
AutoGPTQ: Add UI and command line support for disabling fused attention and fused MLP (#2648)
|
2023-06-15 23:59:54 -03:00 |
|
oobabooga
|
00b94847da
|
Remove softprompt support
|
2023-06-06 07:42:23 -03:00 |
|
oobabooga
|
3a5cfe96f0
|
Increase chat_prompt_size_max
|
2023-06-05 17:37:37 -03:00 |
|
oobabooga
|
f276d88546
|
Use AutoGPTQ by default for GPTQ models
|
2023-06-05 15:41:48 -03:00 |
|
oobabooga
|
19f78684e6
|
Add "Start reply with" feature to chat mode
|
2023-06-02 13:58:08 -03:00 |
|
LaaZa
|
9c066601f5
|
Extend AutoGPTQ support for any GPTQ model (#1668)
|
2023-06-02 01:33:55 -03:00 |
|
oobabooga
|
a83f9aa65b
|
Update shared.py
|
2023-06-01 12:08:39 -03:00 |
|
Honkware
|
204731952a
|
Falcon support (trust-remote-code and autogptq checkboxes) (#2367)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-05-29 10:20:18 -03:00 |
|
oobabooga
|
2f811b1bdf
|
Change a warning message
|
2023-05-28 22:48:20 -03:00 |
|
oobabooga
|
00ebea0b2a
|
Use YAML for presets and settings
|
2023-05-28 22:34:12 -03:00 |
|
oobabooga
|
8efdc01ffb
|
Better default for compute_dtype
|
2023-05-25 15:05:53 -03:00 |
|
DGdev91
|
cf088566f8
|
Make llama.cpp read prompt size and seed from settings (#2299)
|
2023-05-25 10:29:31 -03:00 |
|
oobabooga
|
361451ba60
|
Add --load-in-4bit parameter (#2320)
|
2023-05-25 01:14:13 -03:00 |
|
flurb18
|
d37a28730d
|
Beginning of multi-user support (#2262)
Adds a lock to generate_reply
|
2023-05-24 09:38:20 -03:00 |
|
Gabriel Terrien
|
7aed53559a
|
Support of the --gradio-auth flag (#2283)
|
2023-05-23 20:39:26 -03:00 |
|
oobabooga
|
cd3618d7fb
|
Add support for RWKV in Hugging Face format
|
2023-05-23 02:07:28 -03:00 |
|
Gabriel Terrien
|
0f51b64bb3
|
Add a "dark_theme" option to settings.json (#2288)
|
2023-05-22 19:45:11 -03:00 |
|
oobabooga
|
d63ef59a0f
|
Apply LLaMA-Precise preset to Vicuna by default
|
2023-05-21 23:00:42 -03:00 |
|