Forkoz
|
1576227f16
|
Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-06-14 13:51:01 -03:00 |
|
dependabot[bot]
|
fdd8fab9cf
|
Bump hqq from 0.1.7.post2 to 0.1.7.post3 (#6090)
|
2024-06-14 13:46:35 -03:00 |
|
oobabooga
|
10601850d9
|
Fix after previous commit
|
2024-06-13 19:54:12 -07:00 |
|
oobabooga
|
0f3a423de1
|
Alternative solution to "get next logits" deadlock (#6106)
|
2024-06-13 19:34:16 -07:00 |
|
oobabooga
|
9aef01551d
|
Revert "Use reentrant generation lock (#6107)"
This reverts commit b675151f25 .
|
2024-06-13 17:53:07 -07:00 |
|
oobabooga
|
8930bfc5f4
|
Bump PyTorch, ExLlamaV2, flash-attention (#6122)
|
2024-06-13 20:38:31 -03:00 |
|
oobabooga
|
386500aa37
|
Avoid unnecessary calls UI -> backend, to make it faster
|
2024-06-12 20:52:42 -07:00 |
|
Forkoz
|
1d79aa67cf
|
Fix flash-attn UI parameter to actually store true. (#6076)
|
2024-06-13 00:34:54 -03:00 |
|
Belladore
|
3abafee696
|
DRY sampler improvements (#6053)
|
2024-06-12 23:39:11 -03:00 |
|
theo77186
|
b675151f25
|
Use reentrant generation lock (#6107)
|
2024-06-12 23:25:05 -03:00 |
|
oobabooga
|
a36fa73071
|
Lint
|
2024-06-12 19:00:21 -07:00 |
|
oobabooga
|
2d196ed2fe
|
Remove obsolete pre_layer parameter
|
2024-06-12 18:56:44 -07:00 |
|
Belladore
|
46174a2d33
|
Fix error when bos_token_id is None. (#6061)
|
2024-06-12 22:52:27 -03:00 |
|
Belladore
|
a363cdfca1
|
Fix missing bos token for some models (including Llama-3) (#6050)
|
2024-05-27 09:21:30 -03:00 |
|
oobabooga
|
8df68b05e9
|
Remove MinPLogitsWarper (it's now a transformers built-in)
|
2024-05-27 05:03:30 -07:00 |
|
oobabooga
|
4f1e96b9e3
|
Downloader: Add --model-dir argument, respect --model-dir in the UI
|
2024-05-23 20:42:46 -07:00 |
|
oobabooga
|
ad54d524f7
|
Revert "Fix stopping strings for llama-3 and phi (#6043)"
This reverts commit 5499bc9bc8 .
|
2024-05-22 17:18:08 -07:00 |
|
oobabooga
|
5499bc9bc8
|
Fix stopping strings for llama-3 and phi (#6043)
|
2024-05-22 13:53:59 -03:00 |
|
rohitanshu
|
8aaa0a6f4e
|
Fixed minor typo in docs - Training Tab.md (#6038)
|
2024-05-21 14:52:22 -03:00 |
|
oobabooga
|
9e189947d1
|
Minor fix after bd7cc4234d (thanks @belladoreai)
|
2024-05-21 10:37:30 -07:00 |
|
oobabooga
|
ae86292159
|
Fix getting Phi-3-small-128k-instruct logits
|
2024-05-21 10:35:00 -07:00 |
|
oobabooga
|
bd7cc4234d
|
Backend cleanup (#6025)
|
2024-05-21 13:32:02 -03:00 |
|
oobabooga
|
6a1682aa95
|
README: update command-line flags with raw --help output
This helps me keep this up-to-date more easily.
|
2024-05-19 20:28:46 -07:00 |
|
Philipp Emanuel Weidmann
|
852c943769
|
DRY: A modern repetition penalty that reliably prevents looping (#5677)
|
2024-05-19 23:53:47 -03:00 |
|
oobabooga
|
9f77ed1b98
|
--idle-timeout flag to unload the model if unused for N minutes (#6026)
|
2024-05-19 23:29:39 -03:00 |
|
altoiddealer
|
818b4e0354
|
Let grammar escape backslashes (#5865)
|
2024-05-19 20:26:09 -03:00 |
|
Tisjwlf
|
907702c204
|
Fix gguf multipart file loading (#5857)
|
2024-05-19 20:22:09 -03:00 |
|
Guanghua Lu
|
d7bd3da35e
|
Add Llama 3 instruction template (#5891)
|
2024-05-19 20:17:26 -03:00 |
|
A0nameless0man
|
5cb59707f3
|
fix: grammar not support utf-8 (#5900)
|
2024-05-19 20:10:39 -03:00 |
|
Jari Van Melckebeke
|
8456d13349
|
[docs] small docker changes (#5917)
|
2024-05-19 20:09:37 -03:00 |
|
Samuel Wein
|
b63dc4e325
|
UI: Warn user if they are trying to load a model from no path (#6006)
|
2024-05-19 20:05:17 -03:00 |
|
dependabot[bot]
|
2de586f586
|
Update accelerate requirement from ==0.27.* to ==0.30.* (#5989)
|
2024-05-19 20:03:18 -03:00 |
|
chr
|
6b546a2c8b
|
llama.cpp: increase the max threads from 32 to 256 (#5889)
|
2024-05-19 20:02:19 -03:00 |
|
oobabooga
|
a38a37b3b3
|
llama.cpp: default n_gpu_layers to the maximum value for the model automatically
|
2024-05-19 10:57:42 -07:00 |
|
oobabooga
|
a4611232b7
|
Make --verbose output less spammy
|
2024-05-18 09:57:00 -07:00 |
|
oobabooga
|
0d90b3a25c
|
Bump llama-cpp-python to 0.2.75
|
2024-05-18 05:26:26 -07:00 |
|
oobabooga
|
e225b0b995
|
downloader: fix downloading 01-ai/Yi-1.5-34B-Chat
|
2024-05-12 10:43:50 -07:00 |
|
oobabooga
|
9557f49f2f
|
Bump llama-cpp-python to 0.2.73
|
2024-05-11 10:53:19 -07:00 |
|
oobabooga
|
7a728a38eb
|
Update README
|
2024-05-07 02:59:36 -07:00 |
|
oobabooga
|
d5bde7babc
|
UI: improve the performance of code syntax highlighting
|
2024-05-06 17:45:03 -07:00 |
|
oobabooga
|
0b193b8553
|
Downloader: handle one more retry case after 5770e06c48
|
2024-05-04 19:25:22 -07:00 |
|
oobabooga
|
cb31998605
|
Add a template for NVIDIA ChatQA models
|
2024-05-03 08:19:04 -07:00 |
|
oobabooga
|
e9c9483171
|
Improve the logging messages while loading models
|
2024-05-03 08:10:44 -07:00 |
|
oobabooga
|
e61055253c
|
Bump llama-cpp-python to 0.2.69, add --flash-attn option
|
2024-05-03 04:31:22 -07:00 |
|
oobabooga
|
0476f9fe70
|
Bump ExLlamaV2 to 0.0.20
|
2024-05-01 16:20:50 -07:00 |
|
oobabooga
|
ae0f28530c
|
Bump llama-cpp-python to 0.2.68
|
2024-05-01 08:40:50 -07:00 |
|
oobabooga
|
1eba888af6
|
Update FUNDING.yml
|
2024-05-01 05:54:21 -07:00 |
|
oobabooga
|
51fb766bea
|
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964)
|
2024-04-30 09:11:31 -03:00 |
|
oobabooga
|
5770e06c48
|
Add a retry mechanism to the model downloader (#5943)
|
2024-04-27 12:25:28 -03:00 |
|
oobabooga
|
dfdb6fee22
|
Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
|
2024-04-26 09:39:27 -07:00 |
|