Commit Graph

19 Commits

Author SHA1 Message Date
RandoInternetPreson
46996f6519
ExllamaV2 tensor parallelism to increase multi gpu inference speeds (#6356) 2024-09-28 00:26:03 -03:00
oobabooga
e436d69e2b Add --no_xformers and --no_sdpa flags for ExllamaV2 2024-07-11 15:47:37 -07:00
Bartowski
104573f7d4
Update cache_4bit documentation (#5649)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-07 13:08:21 -03:00
oobabooga
2ec1d96c91
Add cache_4bit option for ExLlamaV2 (#5645) 2024-03-06 23:02:25 -03:00
oobabooga
d6bd71db7f ExLlamaV2: fix loading when autosplit is not set 2024-02-17 12:54:37 -08:00
oobabooga
a6730f88f7
Add --autosplit flag for ExLlamaV2 (#5524) 2024-02-16 15:26:10 -03:00
oobabooga
f1f2c4c3f4
Add --num_experts_per_token parameter (ExLlamav2) (#4955) 2023-12-17 12:08:33 -03:00
oobabooga
c0655475ae Add cache_8bit option 2023-11-02 11:23:04 -07:00
oobabooga
77abd9b69b Add no_flash_attn option 2023-11-02 11:08:53 -07:00
oobabooga
fbac6d21ca Add missing exception 2023-10-20 23:53:24 -07:00
turboderp
ae8cd449ae
ExLlamav2_HF: Convert logits to FP32 (#4310) 2023-10-18 23:16:05 -03:00
Forkoz
8cce1f1126
Exllamav2 lora support (#4229)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-10-14 16:12:41 -03:00
tdrussell
cb26163a20
Fix off-by-one error in exllama_hf caching logic (#4145) 2023-10-05 12:20:56 -03:00
oobabooga
03dc69edc5 ExLlama_HF (v1 and v2) prefix matching 2023-09-19 13:12:19 -07:00
oobabooga
605ec3c9f2 Add a warning about ExLlamaV2 without flash-attn 2023-09-18 12:26:35 -07:00
Panchovix
34dc7306b8
Fix NTK (alpha) and RoPE scaling for exllamav2 and exllamav2_HF (#3897) 2023-09-13 02:35:09 -03:00
oobabooga
b7adf290fc Fix ExLlama-v2 path issue 2023-09-12 17:42:22 -07:00
oobabooga
18e6b275f3 Add alpha_value/compress_pos_emb to ExLlama-v2 2023-09-12 15:02:47 -07:00
oobabooga
c2a309f56e
Add ExLlamaV2 and ExLlamav2_HF loaders (#3881) 2023-09-12 14:33:07 -03:00