chr
|
6b546a2c8b
|
llama.cpp: increase the max threads from 32 to 256 (#5889)
|
2024-05-19 20:02:19 -03:00 |
|
oobabooga
|
e61055253c
|
Bump llama-cpp-python to 0.2.69, add --flash-attn option
|
2024-05-03 04:31:22 -07:00 |
|
oobabooga
|
51fb766bea
|
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964)
|
2024-04-30 09:11:31 -03:00 |
|
oobabooga
|
f0538efb99
|
Remove obsolete --tensorcores references
|
2024-04-24 00:31:28 -07:00 |
|
wangshuai09
|
fd4e46bce2
|
Add Ascend NPU support (basic) (#5541)
|
2024-04-11 18:42:20 -03:00 |
|
Alex O'Connell
|
b94cd6754e
|
UI: Respect model and lora directory settings when downloading files (#5842)
|
2024-04-11 01:55:02 -03:00 |
|
oobabooga
|
d423021a48
|
Remove CTransformers support (#5807)
|
2024-04-04 20:23:58 -03:00 |
|
oobabooga
|
2a92a842ce
|
Bump gradio to 4.23 (#5758)
|
2024-03-26 16:32:20 -03:00 |
|
oobabooga
|
49b111e2dd
|
Lint
|
2024-03-17 08:33:23 -07:00 |
|
oobabooga
|
40a60e0297
|
Convert attention_sink_size to int (closes #5696)
|
2024-03-13 08:15:49 -07:00 |
|
oobabooga
|
63701f59cf
|
UI: mention that n_gpu_layers > 0 is necessary for the GPU to be used
|
2024-03-11 18:54:15 -07:00 |
|
oobabooga
|
056717923f
|
Document StreamingLLM
|
2024-03-10 19:15:23 -07:00 |
|
oobabooga
|
15d90d9bd5
|
Minor logging change
|
2024-03-10 18:20:50 -07:00 |
|
oobabooga
|
afb51bd5d6
|
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669)
|
2024-03-09 00:25:33 -03:00 |
|
Bartowski
|
104573f7d4
|
Update cache_4bit documentation (#5649)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-03-07 13:08:21 -03:00 |
|
oobabooga
|
2ec1d96c91
|
Add cache_4bit option for ExLlamaV2 (#5645)
|
2024-03-06 23:02:25 -03:00 |
|
oobabooga
|
2174958362
|
Revert gradio to 3.50.2 (#5640)
|
2024-03-06 11:52:46 -03:00 |
|
oobabooga
|
63a1d4afc8
|
Bump gradio to 4.19 (#5522)
|
2024-03-05 07:32:28 -03:00 |
|
oobabooga
|
af0bbf5b13
|
Lint
|
2024-02-17 09:01:04 -08:00 |
|
oobabooga
|
a6730f88f7
|
Add --autosplit flag for ExLlamaV2 (#5524)
|
2024-02-16 15:26:10 -03:00 |
|
oobabooga
|
76d28eaa9e
|
Add a menu for customizing the instruction template for the model (#5521)
|
2024-02-16 14:21:17 -03:00 |
|
oobabooga
|
44018c2f69
|
Add a "llamacpp_HF creator" menu (#5519)
|
2024-02-16 12:43:24 -03:00 |
|
oobabooga
|
080f7132c0
|
Revert gradio to 3.50.2 (#5513)
|
2024-02-15 20:40:23 -03:00 |
|
DominikKowalczyk
|
33c4ce0720
|
Bump gradio to 4.19 (#5419)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-02-14 23:28:26 -03:00 |
|
oobabooga
|
b16958575f
|
Minor bug fix
|
2024-02-13 19:48:32 -08:00 |
|
oobabooga
|
d47182d9d1
|
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499)
|
2024-02-14 00:28:51 -03:00 |
|
oobabooga
|
2a1063eff5
|
Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478 .
|
2024-02-06 06:21:36 -08:00 |
|
oobabooga
|
7301c7618f
|
Minor change to Models tab
|
2024-02-04 21:49:58 -08:00 |
|
oobabooga
|
9033fa5eee
|
Organize the Model tab
|
2024-02-04 19:30:22 -08:00 |
|
Forkoz
|
2a45620c85
|
Split by rows instead of layers for llama.cpp multi-gpu (#5435)
|
2024-02-04 23:36:40 -03:00 |
|
Badis Ghoubali
|
3df7e151f7
|
fix the n_batch slider (#5436)
|
2024-02-04 18:15:30 -03:00 |
|
oobabooga
|
cde000d478
|
Remove non-HF ExLlamaV2 loader (#5431)
|
2024-02-04 01:15:51 -03:00 |
|
Forkoz
|
5c5ef4cef7
|
UI: change n_gpu_layers maximum to 256 for larger models. (#5262)
|
2024-01-17 17:13:16 -03:00 |
|
oobabooga
|
cbf6f9e695
|
Update some UI messages
|
2023-12-30 21:31:17 -08:00 |
|
oobabooga
|
0e54a09bcb
|
Remove exllamav1 loaders (#5128)
|
2023-12-31 01:57:06 -03:00 |
|
oobabooga
|
e83e6cedbe
|
Organize the model menu
|
2023-12-19 13:18:26 -08:00 |
|
oobabooga
|
de138b8ba6
|
Add llama-cpp-python wheels with tensor cores support (#5003)
|
2023-12-19 17:30:53 -03:00 |
|
oobabooga
|
0a299d5959
|
Bump llama-cpp-python to 0.2.24 (#5001)
|
2023-12-19 15:22:21 -03:00 |
|
oobabooga
|
f6d701624c
|
UI: mention that QuIP# does not work on Windows
|
2023-12-18 18:05:02 -08:00 |
|
Water
|
674be9a09a
|
Add HQQ quant loader (#4888)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-12-18 21:23:16 -03:00 |
|
oobabooga
|
f1f2c4c3f4
|
Add --num_experts_per_token parameter (ExLlamav2) (#4955)
|
2023-12-17 12:08:33 -03:00 |
|
oobabooga
|
3bbf6c601d
|
AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this)
|
2023-12-15 06:46:13 -08:00 |
|
oobabooga
|
7f1a6a70e3
|
Update the llamacpp_HF comment
|
2023-12-12 21:04:20 -08:00 |
|
Morgan Schweers
|
602b8c6210
|
Make new browser reloads recognize current model. (#4865)
|
2023-12-11 02:51:01 -03:00 |
|
oobabooga
|
2a335b8aa7
|
Cleanup: set shared.model_name only once
|
2023-12-08 06:35:23 -08:00 |
|
oobabooga
|
7fc9033b2e
|
Recommend ExLlama_HF and ExLlamav2_HF
|
2023-12-04 15:28:46 -08:00 |
|
oobabooga
|
e0ca49ed9c
|
Bump llama-cpp-python to 0.2.18 (2nd attempt) (#4637)
* Update requirements*.txt
* Add back seed
|
2023-11-18 00:31:27 -03:00 |
|
oobabooga
|
9d6f79db74
|
Revert "Bump llama-cpp-python to 0.2.18 (#4611)"
This reverts commit 923c8e25fb .
|
2023-11-17 05:14:25 -08:00 |
|
oobabooga
|
8b66d83aa9
|
Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
|
2023-11-16 19:55:28 -08:00 |
|
oobabooga
|
923c8e25fb
|
Bump llama-cpp-python to 0.2.18 (#4611)
|
2023-11-16 22:55:14 -03:00 |
|