Diner Burger
addad3c63e
Allow more granular KV cache settings ( #6561 )
2024-12-17 17:43:48 -03:00
oobabooga
4d9ce586d3
Update llama_cpp_python_hijack.py, fix llamacpp_hf
2024-09-30 14:49:21 -07:00
oobabooga
f243b4ca9c
Make llama-cpp-python not crash immediately
2024-07-04 19:16:00 -07:00
oobabooga
4ea260098f
llama.cpp: add 4-bit/8-bit kv cache options
2024-06-29 09:10:33 -07:00
oobabooga
536f8d58d4
Do not expose alpha_value to llama.cpp & rope_freq_base to transformers
...
To avoid confusion
2024-06-23 22:09:24 -07:00
oobabooga
e61055253c
Bump llama-cpp-python to 0.2.69, add --flash-attn option
2024-05-03 04:31:22 -07:00
oobabooga
51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 ( #5964 )
2024-04-30 09:11:31 -03:00
oobabooga
9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels ( #5921 )
2024-04-23 23:17:05 -03:00
oobabooga
e158299fb4
Fix loading sharted GGUF models through llamacpp_HF
2024-04-11 14:50:05 -07:00
oobabooga
59032140b5
Fix CFG with llamacpp_HF (2nd attempt)
2024-02-19 18:35:42 -08:00
oobabooga
c203c57c18
Fix CFG with llamacpp_HF
2024-02-19 18:09:49 -08:00
oobabooga
86c320ab5a
llama.cpp: add a progress bar for prompt evaluation
2024-02-07 21:56:10 -08:00
Forkoz
2a45620c85
Split by rows instead of layers for llama.cpp multi-gpu ( #5435 )
2024-02-04 23:36:40 -03:00
kalomaze
48327cc5c4
Dynamic Temperature HF loader support ( #5174 )
...
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-01-07 10:36:26 -03:00
oobabooga
de138b8ba6
Add llama-cpp-python wheels with tensor cores support ( #5003 )
2023-12-19 17:30:53 -03:00
oobabooga
0a299d5959
Bump llama-cpp-python to 0.2.24 ( #5001 )
2023-12-19 15:22:21 -03:00
oobabooga
e0ca49ed9c
Bump llama-cpp-python to 0.2.18 (2nd attempt) ( #4637 )
...
* Update requirements*.txt
* Add back seed
2023-11-18 00:31:27 -03:00
oobabooga
9d6f79db74
Revert "Bump llama-cpp-python to 0.2.18 ( #4611 )"
...
This reverts commit 923c8e25fb
.
2023-11-17 05:14:25 -08:00
oobabooga
923c8e25fb
Bump llama-cpp-python to 0.2.18 ( #4611 )
2023-11-16 22:55:14 -03:00
oobabooga
2af7e382b1
Revert "Bump llama-cpp-python to 0.2.14"
...
This reverts commit 5c3eb22ce6
.
The new version has issues:
https://github.com/oobabooga/text-generation-webui/issues/4540
https://github.com/abetlen/llama-cpp-python/issues/893
2023-11-09 10:02:13 -08:00
oobabooga
af3d25a503
Disable logits_all in llamacpp_HF (makes processing 3x faster)
2023-11-07 14:35:48 -08:00
oobabooga
5c3eb22ce6
Bump llama-cpp-python to 0.2.14
2023-11-07 14:20:43 -08:00
oobabooga
df90d03e0b
Replace --mul_mat_q with --no_mul_mat_q
2023-10-22 12:23:03 -07:00
Brian Dashore
7743b5e9de
Llamacpp_HF: Fix CFG cache init ( #4219 )
...
Documentation says that model.context_params should be sent when
a new context is created. The current code uses model.params which
doesn't exist.
Signed-off-by: kingbri <bdashore3@proton.me>
2023-10-07 19:38:29 -03:00
oobabooga
b6fe6acf88
Add threads_batch parameter
2023-10-01 21:28:00 -07:00
jllllll
41a2de96e5
Bump llama-cpp-python to 0.2.11
2023-10-01 18:08:10 -05:00
StoyanStAtanasov
7e6ff8d1f0
Enable NUMA feature for llama_cpp_python ( #4040 )
2023-09-26 22:05:00 -03:00
oobabooga
2e7b6b0014
Create alternative requirements.txt with AMD and Metal wheels ( #4052 )
2023-09-24 09:58:29 -03:00
oobabooga
029da9563f
Avoid redundant function call in llamacpp_hf
2023-09-19 14:14:40 -07:00
oobabooga
745807dc03
Faster llamacpp_HF prefix matching
2023-09-18 11:02:45 -07:00
oobabooga
d71465708c
llamacpp_HF prefix matching
2023-09-17 11:51:01 -07:00
oobabooga
ed86878f02
Remove GGML support
2023-09-11 07:44:00 -07:00
oobabooga
8aeae3b3f4
Fix llamacpp_HF loading
2023-08-26 22:15:06 -07:00
oobabooga
7f5370a272
Minor fixes/cosmetics
2023-08-26 22:11:07 -07:00
jllllll
4d61a7d9da
Account for deprecated GGML parameters
2023-08-26 14:07:46 -05:00
jllllll
4a999e3bcd
Use separate llama-cpp-python packages for GGML support
2023-08-26 10:40:08 -05:00
oobabooga
83640d6f43
Replace ggml occurences with gguf
2023-08-26 01:06:59 -07:00
oobabooga
52ab2a6b9e
Add rope_freq_base parameter for CodeLlama
2023-08-25 06:55:15 -07:00
oobabooga
3320accfdc
Add CFG to llamacpp_HF (second attempt) ( #3678 )
2023-08-24 20:32:21 -03:00
oobabooga
d6934bc7bc
Implement CFG for ExLlama_HF ( #3666 )
2023-08-24 16:27:36 -03:00
oobabooga
7cba000421
Bump llama-cpp-python, +tensor_split by @shouyiwang, +mul_mat_q ( #3610 )
2023-08-18 12:03:34 -03:00
oobabooga
65aa11890f
Refactor everything ( #3481 )
2023-08-06 21:49:27 -03:00
oobabooga
0af10ab49b
Add Classifier Free Guidance (CFG) for Transformers/ExLlama ( #3325 )
2023-08-06 17:22:48 -03:00
oobabooga
87dab03dc0
Add the --cpu option for llama.cpp to prevent CUDA from being used ( #3432 )
2023-08-03 11:00:36 -03:00
oobabooga
b53ed70a70
Make llamacpp_HF 6x faster
2023-08-01 13:18:20 -07:00
oobabooga
b17893a58f
Revert "Add tensor split support for llama.cpp ( #3171 )"
...
This reverts commit 031fe7225e
.
2023-07-26 07:06:01 -07:00
Shouyi
031fe7225e
Add tensor split support for llama.cpp ( #3171 )
2023-07-25 18:59:26 -03:00
oobabooga
a07d070b6c
Add llama-2-70b GGML support ( #3285 )
2023-07-24 16:37:03 -03:00
jllllll
1141987a0d
Add checks for ROCm and unsupported architectures to llama_cpp_cuda loading ( #3225 )
2023-07-24 11:25:36 -03:00
oobabooga
4b19b74e6c
Add CUDA wheels for llama-cpp-python by jllllll
2023-07-19 19:33:43 -07:00