oobabooga
af3d25a503
Disable logits_all in llamacpp_HF (makes processing 3x faster)
2023-11-07 14:35:48 -08:00
oobabooga
5c3eb22ce6
Bump llama-cpp-python to 0.2.14
2023-11-07 14:20:43 -08:00
oobabooga
df90d03e0b
Replace --mul_mat_q with --no_mul_mat_q
2023-10-22 12:23:03 -07:00
Brian Dashore
7743b5e9de
Llamacpp_HF: Fix CFG cache init ( #4219 )
...
Documentation says that model.context_params should be sent when
a new context is created. The current code uses model.params which
doesn't exist.
Signed-off-by: kingbri <bdashore3@proton.me>
2023-10-07 19:38:29 -03:00
oobabooga
b6fe6acf88
Add threads_batch parameter
2023-10-01 21:28:00 -07:00
jllllll
41a2de96e5
Bump llama-cpp-python to 0.2.11
2023-10-01 18:08:10 -05:00
StoyanStAtanasov
7e6ff8d1f0
Enable NUMA feature for llama_cpp_python ( #4040 )
2023-09-26 22:05:00 -03:00
oobabooga
2e7b6b0014
Create alternative requirements.txt with AMD and Metal wheels ( #4052 )
2023-09-24 09:58:29 -03:00
oobabooga
029da9563f
Avoid redundant function call in llamacpp_hf
2023-09-19 14:14:40 -07:00
oobabooga
745807dc03
Faster llamacpp_HF prefix matching
2023-09-18 11:02:45 -07:00
oobabooga
d71465708c
llamacpp_HF prefix matching
2023-09-17 11:51:01 -07:00
oobabooga
ed86878f02
Remove GGML support
2023-09-11 07:44:00 -07:00
oobabooga
8aeae3b3f4
Fix llamacpp_HF loading
2023-08-26 22:15:06 -07:00
oobabooga
7f5370a272
Minor fixes/cosmetics
2023-08-26 22:11:07 -07:00
jllllll
4d61a7d9da
Account for deprecated GGML parameters
2023-08-26 14:07:46 -05:00
jllllll
4a999e3bcd
Use separate llama-cpp-python packages for GGML support
2023-08-26 10:40:08 -05:00
oobabooga
83640d6f43
Replace ggml occurences with gguf
2023-08-26 01:06:59 -07:00
oobabooga
52ab2a6b9e
Add rope_freq_base parameter for CodeLlama
2023-08-25 06:55:15 -07:00
oobabooga
3320accfdc
Add CFG to llamacpp_HF (second attempt) ( #3678 )
2023-08-24 20:32:21 -03:00
oobabooga
d6934bc7bc
Implement CFG for ExLlama_HF ( #3666 )
2023-08-24 16:27:36 -03:00
oobabooga
7cba000421
Bump llama-cpp-python, +tensor_split by @shouyiwang, +mul_mat_q ( #3610 )
2023-08-18 12:03:34 -03:00
oobabooga
65aa11890f
Refactor everything ( #3481 )
2023-08-06 21:49:27 -03:00
oobabooga
0af10ab49b
Add Classifier Free Guidance (CFG) for Transformers/ExLlama ( #3325 )
2023-08-06 17:22:48 -03:00
oobabooga
87dab03dc0
Add the --cpu option for llama.cpp to prevent CUDA from being used ( #3432 )
2023-08-03 11:00:36 -03:00
oobabooga
b53ed70a70
Make llamacpp_HF 6x faster
2023-08-01 13:18:20 -07:00
oobabooga
b17893a58f
Revert "Add tensor split support for llama.cpp ( #3171 )"
...
This reverts commit 031fe7225e
.
2023-07-26 07:06:01 -07:00
Shouyi
031fe7225e
Add tensor split support for llama.cpp ( #3171 )
2023-07-25 18:59:26 -03:00
oobabooga
a07d070b6c
Add llama-2-70b GGML support ( #3285 )
2023-07-24 16:37:03 -03:00
jllllll
1141987a0d
Add checks for ROCm and unsupported architectures to llama_cpp_cuda loading ( #3225 )
2023-07-24 11:25:36 -03:00
oobabooga
4b19b74e6c
Add CUDA wheels for llama-cpp-python by jllllll
2023-07-19 19:33:43 -07:00
randoentity
a69955377a
[GGML] Support for customizable RoPE ( #3083 )
...
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-07-17 22:32:37 -03:00
oobabooga
a199f21799
Optimize llamacpp_hf a bit
2023-07-16 20:49:48 -07:00
oobabooga
6a3edb0542
Clean up llamacpp_hf.py
2023-07-15 22:40:55 -07:00
oobabooga
5e3f7e00a9
Create llamacpp_HF loader ( #3062 )
2023-07-16 02:21:13 -03:00