Commit Graph

3980 Commits

Author SHA1 Message Date
oobabooga
9e189947d1 Minor fix after bd7cc4234d (thanks @belladoreai) 2024-05-21 10:37:30 -07:00
oobabooga
ae86292159 Fix getting Phi-3-small-128k-instruct logits 2024-05-21 10:35:00 -07:00
oobabooga
bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga
6a1682aa95 README: update command-line flags with raw --help output
This helps me keep this up-to-date more easily.
2024-05-19 20:28:46 -07:00
Philipp Emanuel Weidmann
852c943769
DRY: A modern repetition penalty that reliably prevents looping (#5677) 2024-05-19 23:53:47 -03:00
oobabooga
9f77ed1b98
--idle-timeout flag to unload the model if unused for N minutes (#6026) 2024-05-19 23:29:39 -03:00
altoiddealer
818b4e0354
Let grammar escape backslashes (#5865) 2024-05-19 20:26:09 -03:00
Tisjwlf
907702c204
Fix gguf multipart file loading (#5857) 2024-05-19 20:22:09 -03:00
Guanghua Lu
d7bd3da35e
Add Llama 3 instruction template (#5891) 2024-05-19 20:17:26 -03:00
A0nameless0man
5cb59707f3
fix: grammar not support utf-8 (#5900) 2024-05-19 20:10:39 -03:00
Jari Van Melckebeke
8456d13349
[docs] small docker changes (#5917) 2024-05-19 20:09:37 -03:00
Samuel Wein
b63dc4e325
UI: Warn user if they are trying to load a model from no path (#6006) 2024-05-19 20:05:17 -03:00
dependabot[bot]
2de586f586
Update accelerate requirement from ==0.27.* to ==0.30.* (#5989) 2024-05-19 20:03:18 -03:00
chr
6b546a2c8b
llama.cpp: increase the max threads from 32 to 256 (#5889) 2024-05-19 20:02:19 -03:00
oobabooga
abe5ddc883
Merge pull request #6027 from oobabooga/dev
Merge dev branch
2024-05-19 19:01:11 -03:00
oobabooga
a38a37b3b3 llama.cpp: default n_gpu_layers to the maximum value for the model automatically 2024-05-19 10:57:42 -07:00
oobabooga
a4611232b7 Make --verbose output less spammy 2024-05-18 09:57:00 -07:00
oobabooga
0d90b3a25c Bump llama-cpp-python to 0.2.75 2024-05-18 05:26:26 -07:00
oobabooga
e225b0b995 downloader: fix downloading 01-ai/Yi-1.5-34B-Chat 2024-05-12 10:43:50 -07:00
oobabooga
9557f49f2f Bump llama-cpp-python to 0.2.73 2024-05-11 10:53:19 -07:00
oobabooga
9ac528715c
Merge pull request #5996 from oobabooga/dev
Merge dev branch
2024-05-08 16:37:26 -03:00
oobabooga
7a728a38eb Update README 2024-05-07 02:59:36 -07:00
oobabooga
d5bde7babc UI: improve the performance of code syntax highlighting 2024-05-06 17:45:03 -07:00
oobabooga
0b193b8553 Downloader: handle one more retry case after 5770e06c48 2024-05-04 19:25:22 -07:00
oobabooga
cb31998605 Add a template for NVIDIA ChatQA models 2024-05-03 08:19:04 -07:00
oobabooga
e9c9483171 Improve the logging messages while loading models 2024-05-03 08:10:44 -07:00
oobabooga
e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga
0476f9fe70 Bump ExLlamaV2 to 0.0.20 2024-05-01 16:20:50 -07:00
oobabooga
ae0f28530c Bump llama-cpp-python to 0.2.68 2024-05-01 08:40:50 -07:00
oobabooga
8f12fb028d
Merge pull request #5970 from oobabooga/dev
Merge dev branch
2024-05-01 09:56:23 -03:00
oobabooga
1eba888af6 Update FUNDING.yml 2024-05-01 05:54:21 -07:00
oobabooga
51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga
81f603d09f
Merge pull request #5959 from oobabooga/dev
Merge dev branch
2024-04-29 15:45:48 -03:00
oobabooga
5770e06c48
Add a retry mechanism to the model downloader (#5943) 2024-04-27 12:25:28 -03:00
oobabooga
dfdb6fee22 Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
2024-04-26 09:39:27 -07:00
oobabooga
70845c76fb
Add back the max_updates_second parameter (#5937) 2024-04-26 10:14:51 -03:00
oobabooga
6761b5e7c6
Improved instruct style (with syntax highlighting & LaTeX rendering) (#5936) 2024-04-26 10:13:11 -03:00
oobabooga
9c04365f54 Detect the airoboros-3_1-yi-34b-200k template 2024-04-25 16:50:54 -07:00
oobabooga
8b1dee3ec8 Detect platypus-yi-34b, CausalLM-RP-34B, 34b-beta instruction templates 2024-04-24 21:47:43 -07:00
oobabooga
4aa481282b Detect the xwin-lm-70b-v0.1 instruction template 2024-04-24 17:02:20 -07:00
oobabooga
ad122361ea
Merge pull request #5927 from oobabooga/dev
Merge dev branch
2024-04-24 13:58:53 -03:00
oobabooga
c9b0df16ee Lint 2024-04-24 09:55:00 -07:00
oobabooga
4094813f8d Lint 2024-04-24 09:53:41 -07:00
oobabooga
64e2a9a0a7 Fix the Phi-3 template when used in the UI 2024-04-24 01:34:11 -07:00
oobabooga
f0538efb99 Remove obsolete --tensorcores references 2024-04-24 00:31:28 -07:00
Colin
f3c9103e04
Revert walrus operator for params['max_memory'] (#5878) 2024-04-24 01:09:14 -03:00
Jari Van Melckebeke
c725d97368
nvidia docker: make sure gradio listens on 0.0.0.0 (#5918) 2024-04-23 23:17:55 -03:00
oobabooga
9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels (#5921) 2024-04-23 23:17:05 -03:00
Ashley Kleynhans
0877741b03
Bumped ExLlamaV2 to version 0.0.19 to resolve #5851 (#5880) 2024-04-19 19:04:40 -03:00
oobabooga
a4b732c30b
Merge pull request #5887 from oobabooga/dev
Merge dev branch
2024-04-19 12:34:50 -03:00