Vhallo
|
a9a6d72d8c
|
Use gr.Number for RoPE scaling parameters (#6233)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-07-20 18:57:09 -03:00 |
|
oobabooga
|
e436d69e2b
|
Add --no_xformers and --no_sdpa flags for ExllamaV2
|
2024-07-11 15:47:37 -07:00 |
|
oobabooga
|
c176244327
|
UI: Move cache_8bit/cache_4bit further up
|
2024-07-05 12:16:21 -07:00 |
|
oobabooga
|
e79e7b90dc
|
UI: Move the cache_8bit and cache_4bit elements up
|
2024-07-04 20:21:28 -07:00 |
|
oobabooga
|
8b44d7b12a
|
Lint
|
2024-07-04 20:16:44 -07:00 |
|
GralchemOz
|
8a39f579d8
|
transformers: Add eager attention option to make Gemma-2 work properly (#6188)
|
2024-07-01 12:08:08 -03:00 |
|
oobabooga
|
577a8cd3ee
|
Add TensorRT-LLM support (#5715)
|
2024-06-24 02:30:03 -03:00 |
|
oobabooga
|
b48ab482f8
|
Remove obsolete "gptq_for_llama_info" message
|
2024-06-23 22:05:19 -07:00 |
|
GodEmperor785
|
2c5a9eb597
|
Change limits of RoPE scaling sliders in UI (#6142)
|
2024-06-19 21:42:17 -03:00 |
|
Forkoz
|
1d79aa67cf
|
Fix flash-attn UI parameter to actually store true. (#6076)
|
2024-06-13 00:34:54 -03:00 |
|
oobabooga
|
2d196ed2fe
|
Remove obsolete pre_layer parameter
|
2024-06-12 18:56:44 -07:00 |
|
oobabooga
|
4f1e96b9e3
|
Downloader: Add --model-dir argument, respect --model-dir in the UI
|
2024-05-23 20:42:46 -07:00 |
|
oobabooga
|
9e189947d1
|
Minor fix after bd7cc4234d (thanks @belladoreai)
|
2024-05-21 10:37:30 -07:00 |
|
oobabooga
|
bd7cc4234d
|
Backend cleanup (#6025)
|
2024-05-21 13:32:02 -03:00 |
|
Samuel Wein
|
b63dc4e325
|
UI: Warn user if they are trying to load a model from no path (#6006)
|
2024-05-19 20:05:17 -03:00 |
|
chr
|
6b546a2c8b
|
llama.cpp: increase the max threads from 32 to 256 (#5889)
|
2024-05-19 20:02:19 -03:00 |
|
oobabooga
|
e61055253c
|
Bump llama-cpp-python to 0.2.69, add --flash-attn option
|
2024-05-03 04:31:22 -07:00 |
|
oobabooga
|
51fb766bea
|
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964)
|
2024-04-30 09:11:31 -03:00 |
|
oobabooga
|
f0538efb99
|
Remove obsolete --tensorcores references
|
2024-04-24 00:31:28 -07:00 |
|
wangshuai09
|
fd4e46bce2
|
Add Ascend NPU support (basic) (#5541)
|
2024-04-11 18:42:20 -03:00 |
|
Alex O'Connell
|
b94cd6754e
|
UI: Respect model and lora directory settings when downloading files (#5842)
|
2024-04-11 01:55:02 -03:00 |
|
oobabooga
|
d423021a48
|
Remove CTransformers support (#5807)
|
2024-04-04 20:23:58 -03:00 |
|
oobabooga
|
2a92a842ce
|
Bump gradio to 4.23 (#5758)
|
2024-03-26 16:32:20 -03:00 |
|
oobabooga
|
49b111e2dd
|
Lint
|
2024-03-17 08:33:23 -07:00 |
|
oobabooga
|
40a60e0297
|
Convert attention_sink_size to int (closes #5696)
|
2024-03-13 08:15:49 -07:00 |
|
oobabooga
|
63701f59cf
|
UI: mention that n_gpu_layers > 0 is necessary for the GPU to be used
|
2024-03-11 18:54:15 -07:00 |
|
oobabooga
|
056717923f
|
Document StreamingLLM
|
2024-03-10 19:15:23 -07:00 |
|
oobabooga
|
15d90d9bd5
|
Minor logging change
|
2024-03-10 18:20:50 -07:00 |
|
oobabooga
|
afb51bd5d6
|
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669)
|
2024-03-09 00:25:33 -03:00 |
|
Bartowski
|
104573f7d4
|
Update cache_4bit documentation (#5649)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-03-07 13:08:21 -03:00 |
|
oobabooga
|
2ec1d96c91
|
Add cache_4bit option for ExLlamaV2 (#5645)
|
2024-03-06 23:02:25 -03:00 |
|
oobabooga
|
2174958362
|
Revert gradio to 3.50.2 (#5640)
|
2024-03-06 11:52:46 -03:00 |
|
oobabooga
|
63a1d4afc8
|
Bump gradio to 4.19 (#5522)
|
2024-03-05 07:32:28 -03:00 |
|
oobabooga
|
af0bbf5b13
|
Lint
|
2024-02-17 09:01:04 -08:00 |
|
oobabooga
|
a6730f88f7
|
Add --autosplit flag for ExLlamaV2 (#5524)
|
2024-02-16 15:26:10 -03:00 |
|
oobabooga
|
76d28eaa9e
|
Add a menu for customizing the instruction template for the model (#5521)
|
2024-02-16 14:21:17 -03:00 |
|
oobabooga
|
44018c2f69
|
Add a "llamacpp_HF creator" menu (#5519)
|
2024-02-16 12:43:24 -03:00 |
|
oobabooga
|
080f7132c0
|
Revert gradio to 3.50.2 (#5513)
|
2024-02-15 20:40:23 -03:00 |
|
DominikKowalczyk
|
33c4ce0720
|
Bump gradio to 4.19 (#5419)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-02-14 23:28:26 -03:00 |
|
oobabooga
|
b16958575f
|
Minor bug fix
|
2024-02-13 19:48:32 -08:00 |
|
oobabooga
|
d47182d9d1
|
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499)
|
2024-02-14 00:28:51 -03:00 |
|
oobabooga
|
2a1063eff5
|
Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478 .
|
2024-02-06 06:21:36 -08:00 |
|
oobabooga
|
7301c7618f
|
Minor change to Models tab
|
2024-02-04 21:49:58 -08:00 |
|
oobabooga
|
9033fa5eee
|
Organize the Model tab
|
2024-02-04 19:30:22 -08:00 |
|
Forkoz
|
2a45620c85
|
Split by rows instead of layers for llama.cpp multi-gpu (#5435)
|
2024-02-04 23:36:40 -03:00 |
|
Badis Ghoubali
|
3df7e151f7
|
fix the n_batch slider (#5436)
|
2024-02-04 18:15:30 -03:00 |
|
oobabooga
|
cde000d478
|
Remove non-HF ExLlamaV2 loader (#5431)
|
2024-02-04 01:15:51 -03:00 |
|
Forkoz
|
5c5ef4cef7
|
UI: change n_gpu_layers maximum to 256 for larger models. (#5262)
|
2024-01-17 17:13:16 -03:00 |
|
oobabooga
|
cbf6f9e695
|
Update some UI messages
|
2023-12-30 21:31:17 -08:00 |
|
oobabooga
|
0e54a09bcb
|
Remove exllamav1 loaders (#5128)
|
2023-12-31 01:57:06 -03:00 |
|