oobabooga
|
0589ff5b12
|
Bump llama-cpp-python to 0.2.19 & add min_p and typical_p parameters to llama.cpp loader (#4701)
|
2023-11-21 20:59:39 -03:00 |
|
oobabooga
|
e0ca49ed9c
|
Bump llama-cpp-python to 0.2.18 (2nd attempt) (#4637)
* Update requirements*.txt
* Add back seed
|
2023-11-18 00:31:27 -03:00 |
|
oobabooga
|
9d6f79db74
|
Revert "Bump llama-cpp-python to 0.2.18 (#4611)"
This reverts commit 923c8e25fb .
|
2023-11-17 05:14:25 -08:00 |
|
oobabooga
|
8b66d83aa9
|
Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
|
2023-11-16 19:55:28 -08:00 |
|
oobabooga
|
923c8e25fb
|
Bump llama-cpp-python to 0.2.18 (#4611)
|
2023-11-16 22:55:14 -03:00 |
|
oobabooga
|
58c6001be9
|
Add missing exllamav2 samplers
|
2023-11-16 07:09:40 -08:00 |
|
oobabooga
|
af3d25a503
|
Disable logits_all in llamacpp_HF (makes processing 3x faster)
|
2023-11-07 14:35:48 -08:00 |
|
feng lui
|
4766a57352
|
transformers: add use_flash_attention_2 option (#4373)
|
2023-11-04 13:59:33 -03:00 |
|
oobabooga
|
aa5d671579
|
Add temperature_last parameter (#4472)
|
2023-11-04 13:09:07 -03:00 |
|
kalomaze
|
367e5e6e43
|
Implement Min P as a sampler option in HF loaders (#4449)
|
2023-11-02 16:32:51 -03:00 |
|
oobabooga
|
c0655475ae
|
Add cache_8bit option
|
2023-11-02 11:23:04 -07:00 |
|
tdrussell
|
72f6fc6923
|
Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (#4376)
|
2023-10-25 12:10:28 -03:00 |
|
oobabooga
|
ef1489cd4d
|
Remove unused parameter in AutoAWQ
|
2023-10-23 20:45:43 -07:00 |
|
tdrussell
|
4440f87722
|
Add additive_repetition_penalty sampler setting. (#3627)
|
2023-10-23 02:28:07 -03:00 |
|
oobabooga
|
df90d03e0b
|
Replace --mul_mat_q with --no_mul_mat_q
|
2023-10-22 12:23:03 -07:00 |
|
Johan
|
1d5a015ce7
|
Enable special token support for exllamav2 (#4314)
|
2023-10-21 01:54:06 -03:00 |
|
turboderp
|
8a98646a21
|
Bump ExLlamaV2 to 0.0.5 (#4186)
|
2023-10-05 19:12:22 -03:00 |
|
cal066
|
cc632c3f33
|
AutoAWQ: initial support (#3999)
|
2023-10-05 13:19:18 -03:00 |
|
oobabooga
|
ae4ba3007f
|
Add grammar to transformers and _HF loaders (#4091)
|
2023-10-05 10:01:36 -03:00 |
|
oobabooga
|
b6fe6acf88
|
Add threads_batch parameter
|
2023-10-01 21:28:00 -07:00 |
|
jllllll
|
41a2de96e5
|
Bump llama-cpp-python to 0.2.11
|
2023-10-01 18:08:10 -05:00 |
|
oobabooga
|
56b5a4af74
|
exllamav2 typical_p
|
2023-09-28 20:10:12 -07:00 |
|
StoyanStAtanasov
|
7e6ff8d1f0
|
Enable NUMA feature for llama_cpp_python (#4040)
|
2023-09-26 22:05:00 -03:00 |
|
oobabooga
|
d0d221df49
|
Add --use_fast option (closes #3741)
|
2023-09-25 12:19:43 -07:00 |
|
oobabooga
|
36c38d7561
|
Add disable_exllama to Transformers loader (for GPTQ LoRA training)
|
2023-09-24 20:03:11 -07:00 |
|
oobabooga
|
08cf150c0c
|
Add a grammar editor to the UI (#4061)
|
2023-09-24 18:05:24 -03:00 |
|
oobabooga
|
eb0b7c1053
|
Fix a minor UI bug
|
2023-09-24 07:17:33 -07:00 |
|
oobabooga
|
b227e65d86
|
Add grammar to llama.cpp loader (closes #4019)
|
2023-09-24 07:10:45 -07:00 |
|
saltacc
|
f01b9aa71f
|
Add customizable ban tokens (#3899)
|
2023-09-15 18:27:27 -03:00 |
|
oobabooga
|
2f935547c8
|
Minor changes
|
2023-09-12 15:05:21 -07:00 |
|
oobabooga
|
18e6b275f3
|
Add alpha_value/compress_pos_emb to ExLlama-v2
|
2023-09-12 15:02:47 -07:00 |
|
oobabooga
|
c2a309f56e
|
Add ExLlamaV2 and ExLlamav2_HF loaders (#3881)
|
2023-09-12 14:33:07 -03:00 |
|
oobabooga
|
ed86878f02
|
Remove GGML support
|
2023-09-11 07:44:00 -07:00 |
|
Ravindra Marella
|
e4c3e1bdd2
|
Fix ctransformers model unload (#3711)
Add missing comma in model types list
Fixes marella/ctransformers#111
|
2023-08-27 10:53:48 -03:00 |
|
oobabooga
|
52ab2a6b9e
|
Add rope_freq_base parameter for CodeLlama
|
2023-08-25 06:55:15 -07:00 |
|
oobabooga
|
3320accfdc
|
Add CFG to llamacpp_HF (second attempt) (#3678)
|
2023-08-24 20:32:21 -03:00 |
|
oobabooga
|
d6934bc7bc
|
Implement CFG for ExLlama_HF (#3666)
|
2023-08-24 16:27:36 -03:00 |
|
cal066
|
e042bf8624
|
ctransformers: add mlock and no-mmap options (#3649)
|
2023-08-22 16:51:34 -03:00 |
|
oobabooga
|
7cba000421
|
Bump llama-cpp-python, +tensor_split by @shouyiwang, +mul_mat_q (#3610)
|
2023-08-18 12:03:34 -03:00 |
|
cal066
|
991bb57e43
|
ctransformers: Fix up model_type name consistency (#3567)
|
2023-08-14 15:17:24 -03:00 |
|
oobabooga
|
ccfc02a28d
|
Add the --disable_exllama option for AutoGPTQ (#3545 from clefever/disable-exllama)
|
2023-08-14 15:15:55 -03:00 |
|
Eve
|
66c04c304d
|
Various ctransformers fixes (#3556)
---------
Co-authored-by: cal066 <cal066@users.noreply.github.com>
|
2023-08-13 23:09:03 -03:00 |
|
cal066
|
bf70c19603
|
ctransformers: move thread and seed parameters (#3543)
|
2023-08-13 00:04:03 -03:00 |
|
Chris Lefever
|
0230fa4e9c
|
Add the --disable_exllama option for AutoGPTQ
|
2023-08-12 02:26:58 -04:00 |
|
oobabooga
|
2f918ccf7c
|
Remove unused parameter
|
2023-08-11 11:15:22 -07:00 |
|
oobabooga
|
28c8df337b
|
Add repetition_penalty_range to ctransformers
|
2023-08-11 11:04:19 -07:00 |
|
cal066
|
7a4fcee069
|
Add ctransformers support (#3313)
---------
Co-authored-by: cal066 <cal066@users.noreply.github.com>
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
Co-authored-by: randoentity <137087500+randoentity@users.noreply.github.com>
|
2023-08-11 14:41:33 -03:00 |
|
oobabooga
|
d8fb506aff
|
Add RoPE scaling support for transformers (including dynamic NTK)
https://github.com/huggingface/transformers/pull/24653
|
2023-08-08 21:25:48 -07:00 |
|
oobabooga
|
0af10ab49b
|
Add Classifier Free Guidance (CFG) for Transformers/ExLlama (#3325)
|
2023-08-06 17:22:48 -03:00 |
|
oobabooga
|
87dab03dc0
|
Add the --cpu option for llama.cpp to prevent CUDA from being used (#3432)
|
2023-08-03 11:00:36 -03:00 |
|
oobabooga
|
32a2bbee4a
|
Implement auto_max_new_tokens for ExLlama
|
2023-08-02 11:03:56 -07:00 |
|
oobabooga
|
e931844fe2
|
Add auto_max_new_tokens parameter (#3419)
|
2023-08-02 14:52:20 -03:00 |
|
oobabooga
|
84297d05c4
|
Add a "Filter by loader" menu to the Parameters tab
|
2023-07-31 19:09:02 -07:00 |
|
oobabooga
|
b17893a58f
|
Revert "Add tensor split support for llama.cpp (#3171)"
This reverts commit 031fe7225e .
|
2023-07-26 07:06:01 -07:00 |
|
Shouyi
|
031fe7225e
|
Add tensor split support for llama.cpp (#3171)
|
2023-07-25 18:59:26 -03:00 |
|
oobabooga
|
a07d070b6c
|
Add llama-2-70b GGML support (#3285)
|
2023-07-24 16:37:03 -03:00 |
|
randoentity
|
a69955377a
|
[GGML] Support for customizable RoPE (#3083)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-07-17 22:32:37 -03:00 |
|
oobabooga
|
5e3f7e00a9
|
Create llamacpp_HF loader (#3062)
|
2023-07-16 02:21:13 -03:00 |
|
oobabooga
|
e202190c4f
|
lint
|
2023-07-12 11:33:25 -07:00 |
|
Gabriel Pena
|
eedb3bf023
|
Add low vram mode on llama cpp (#3076)
|
2023-07-12 11:05:13 -03:00 |
|
Panchovix
|
10c8c197bf
|
Add Support for Static NTK RoPE scaling for exllama/exllama_hf (#2955)
|
2023-07-04 01:13:16 -03:00 |
|
oobabooga
|
c52290de50
|
ExLlama with long context (#2875)
|
2023-06-25 22:49:26 -03:00 |
|
oobabooga
|
3ae9af01aa
|
Add --no_use_cuda_fp16 param for AutoGPTQ
|
2023-06-23 12:22:56 -03:00 |
|
LarryVRH
|
580c1ee748
|
Implement a demo HF wrapper for exllama to utilize existing HF transformers decoding. (#2777)
|
2023-06-21 15:31:42 -03:00 |
|
oobabooga
|
5f392122fd
|
Add gpu_split param to ExLlama
Adapted from code created by Ph0rk0z. Thank you Ph0rk0z.
|
2023-06-16 20:49:36 -03:00 |
|
oobabooga
|
9f40032d32
|
Add ExLlama support (#2444)
|
2023-06-16 20:35:38 -03:00 |
|
oobabooga
|
dea43685b0
|
Add some clarifications
|
2023-06-16 19:10:53 -03:00 |
|
oobabooga
|
7ef6a50e84
|
Reorganize model loading UI completely (#2720)
|
2023-06-16 19:00:37 -03:00 |
|