text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2025-01-13 22:09:19 +01:00

Author	SHA1	Message	Date
Forkoz	1576227f16	Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119 ) --------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>	2024-06-14 13:51:01 -03:00
dependabot[bot]	fdd8fab9cf	Bump hqq from 0.1.7.post2 to 0.1.7.post3 (#6090 )	2024-06-14 13:46:35 -03:00
oobabooga	10601850d9	Fix after previous commit	2024-06-13 19:54:12 -07:00
oobabooga	0f3a423de1	Alternative solution to "get next logits" deadlock (#6106 )	2024-06-13 19:34:16 -07:00
oobabooga	9aef01551d	Revert "Use reentrant generation lock (#6107 )" This reverts commit `b675151f25`.	2024-06-13 17:53:07 -07:00
oobabooga	8930bfc5f4	Bump PyTorch, ExLlamaV2, flash-attention (#6122 )	2024-06-13 20:38:31 -03:00
oobabooga	386500aa37	Avoid unnecessary calls UI -> backend, to make it faster	2024-06-12 20:52:42 -07:00
Forkoz	1d79aa67cf	Fix flash-attn UI parameter to actually store true. (#6076 )	2024-06-13 00:34:54 -03:00
Belladore	3abafee696	DRY sampler improvements (#6053 )	2024-06-12 23:39:11 -03:00
theo77186	b675151f25	Use reentrant generation lock (#6107 )	2024-06-12 23:25:05 -03:00
oobabooga	a36fa73071	Lint	2024-06-12 19:00:21 -07:00
oobabooga	2d196ed2fe	Remove obsolete pre_layer parameter	2024-06-12 18:56:44 -07:00
Belladore	46174a2d33	Fix error when bos_token_id is None. (#6061 )	2024-06-12 22:52:27 -03:00
Belladore	a363cdfca1	Fix missing bos token for some models (including Llama-3) (#6050 )	2024-05-27 09:21:30 -03:00
oobabooga	8df68b05e9	Remove MinPLogitsWarper (it's now a transformers built-in)	2024-05-27 05:03:30 -07:00
oobabooga	4f1e96b9e3	Downloader: Add --model-dir argument, respect --model-dir in the UI	2024-05-23 20:42:46 -07:00
oobabooga	ad54d524f7	Revert "Fix stopping strings for llama-3 and phi (#6043 )" This reverts commit `5499bc9bc8`.	2024-05-22 17:18:08 -07:00
oobabooga	5499bc9bc8	Fix stopping strings for llama-3 and phi (#6043 )	2024-05-22 13:53:59 -03:00
rohitanshu	8aaa0a6f4e	Fixed minor typo in docs - Training Tab.md (#6038 )	2024-05-21 14:52:22 -03:00
oobabooga	9e189947d1	Minor fix after `bd7cc4234d` (thanks @belladoreai)	2024-05-21 10:37:30 -07:00
oobabooga	ae86292159	Fix getting Phi-3-small-128k-instruct logits	2024-05-21 10:35:00 -07:00
oobabooga	bd7cc4234d	Backend cleanup (#6025 )	2024-05-21 13:32:02 -03:00
oobabooga	6a1682aa95	README: update command-line flags with raw --help output This helps me keep this up-to-date more easily.	2024-05-19 20:28:46 -07:00
Philipp Emanuel Weidmann	852c943769	DRY: A modern repetition penalty that reliably prevents looping (#5677 )	2024-05-19 23:53:47 -03:00
oobabooga	9f77ed1b98	--idle-timeout flag to unload the model if unused for N minutes (#6026 )	2024-05-19 23:29:39 -03:00
altoiddealer	818b4e0354	Let grammar escape backslashes (#5865 )	2024-05-19 20:26:09 -03:00
Tisjwlf	907702c204	Fix gguf multipart file loading (#5857 )	2024-05-19 20:22:09 -03:00
Guanghua Lu	d7bd3da35e	Add Llama 3 instruction template (#5891 )	2024-05-19 20:17:26 -03:00
A0nameless0man	5cb59707f3	fix: grammar not support utf-8 (#5900 )	2024-05-19 20:10:39 -03:00
Jari Van Melckebeke	8456d13349	[docs] small docker changes (#5917 )	2024-05-19 20:09:37 -03:00
Samuel Wein	b63dc4e325	UI: Warn user if they are trying to load a model from no path (#6006 )	2024-05-19 20:05:17 -03:00
dependabot[bot]	2de586f586	Update accelerate requirement from ==0.27.* to ==0.30.* (#5989 )	2024-05-19 20:03:18 -03:00
chr	6b546a2c8b	llama.cpp: increase the max threads from 32 to 256 (#5889 )	2024-05-19 20:02:19 -03:00
oobabooga	a38a37b3b3	llama.cpp: default n_gpu_layers to the maximum value for the model automatically	2024-05-19 10:57:42 -07:00
oobabooga	a4611232b7	Make --verbose output less spammy	2024-05-18 09:57:00 -07:00
oobabooga	0d90b3a25c	Bump llama-cpp-python to 0.2.75	2024-05-18 05:26:26 -07:00
oobabooga	e225b0b995	downloader: fix downloading 01-ai/Yi-1.5-34B-Chat	2024-05-12 10:43:50 -07:00
oobabooga	9557f49f2f	Bump llama-cpp-python to 0.2.73	2024-05-11 10:53:19 -07:00
oobabooga	7a728a38eb	Update README	2024-05-07 02:59:36 -07:00
oobabooga	d5bde7babc	UI: improve the performance of code syntax highlighting	2024-05-06 17:45:03 -07:00
oobabooga	0b193b8553	Downloader: handle one more retry case after `5770e06c48`	2024-05-04 19:25:22 -07:00
oobabooga	cb31998605	Add a template for NVIDIA ChatQA models	2024-05-03 08:19:04 -07:00
oobabooga	e9c9483171	Improve the logging messages while loading models	2024-05-03 08:10:44 -07:00
oobabooga	e61055253c	Bump llama-cpp-python to 0.2.69, add --flash-attn option	2024-05-03 04:31:22 -07:00
oobabooga	0476f9fe70	Bump ExLlamaV2 to 0.0.20	2024-05-01 16:20:50 -07:00
oobabooga	ae0f28530c	Bump llama-cpp-python to 0.2.68	2024-05-01 08:40:50 -07:00
oobabooga	1eba888af6	Update FUNDING.yml	2024-05-01 05:54:21 -07:00
oobabooga	51fb766bea	Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964 )	2024-04-30 09:11:31 -03:00
oobabooga	5770e06c48	Add a retry mechanism to the model downloader (#5943 )	2024-04-27 12:25:28 -03:00
oobabooga	dfdb6fee22	Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit To allow for 32-bit CPU offloading (it's very slow).	2024-04-26 09:39:27 -07:00

1 2 3 4 5 ...

3698 Commits