oobabooga
|
dfdb6fee22
|
Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
|
2024-04-26 09:39:27 -07:00 |
|
oobabooga
|
4094813f8d
|
Lint
|
2024-04-24 09:53:41 -07:00 |
|
Colin
|
f3c9103e04
|
Revert walrus operator for params['max_memory'] (#5878)
|
2024-04-24 01:09:14 -03:00 |
|
wangshuai09
|
fd4e46bce2
|
Add Ascend NPU support (basic) (#5541)
|
2024-04-11 18:42:20 -03:00 |
|
oobabooga
|
d02744282b
|
Minor logging change
|
2024-04-06 18:56:58 -07:00 |
|
oobabooga
|
1bdceea2d4
|
UI: Focus on the chat input after starting a new chat
|
2024-04-06 12:57:57 -07:00 |
|
oobabooga
|
1b87844928
|
Minor fix
|
2024-04-05 18:43:43 -07:00 |
|
oobabooga
|
6b7f7555fc
|
Logging message to make transformers loader a bit more transparent
|
2024-04-05 18:40:02 -07:00 |
|
oobabooga
|
308452b783
|
Bitsandbytes: load preconverted 4bit models without additional flags
|
2024-04-04 18:10:24 -07:00 |
|
oobabooga
|
d423021a48
|
Remove CTransformers support (#5807)
|
2024-04-04 20:23:58 -03:00 |
|
oobabooga
|
13fe38eb27
|
Remove specialized code for gpt-4chan
|
2024-04-04 16:11:47 -07:00 |
|
oobabooga
|
4039999be5
|
Autodetect llamacpp_HF loader when tokenizer exists
|
2024-02-16 09:29:26 -08:00 |
|
oobabooga
|
b2b74c83a6
|
Fix Qwen1.5 in llamacpp_HF
|
2024-02-15 19:04:19 -08:00 |
|
oobabooga
|
d47182d9d1
|
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499)
|
2024-02-14 00:28:51 -03:00 |
|
oobabooga
|
4e34ae0587
|
Minor logging improvements
|
2024-02-06 08:22:08 -08:00 |
|
oobabooga
|
8ee3cea7cb
|
Improve some log messages
|
2024-02-06 06:31:27 -08:00 |
|
oobabooga
|
2a1063eff5
|
Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478 .
|
2024-02-06 06:21:36 -08:00 |
|
oobabooga
|
cde000d478
|
Remove non-HF ExLlamaV2 loader (#5431)
|
2024-02-04 01:15:51 -03:00 |
|
sam-ngu
|
c0bdcee646
|
added trust_remote_code to deepspeed init loaderClass (#5237)
|
2024-01-26 11:10:57 -03:00 |
|
oobabooga
|
89e7e107fc
|
Lint
|
2024-01-09 16:27:50 -08:00 |
|
oobabooga
|
94afa0f9cf
|
Minor style changes
|
2024-01-01 16:00:22 -08:00 |
|
oobabooga
|
2734ce3e4c
|
Remove RWKV loader (#5130)
|
2023-12-31 02:01:40 -03:00 |
|
oobabooga
|
0e54a09bcb
|
Remove exllamav1 loaders (#5128)
|
2023-12-31 01:57:06 -03:00 |
|
oobabooga
|
8e397915c9
|
Remove --sdp-attention, --xformers flags (#5126)
|
2023-12-31 01:36:51 -03:00 |
|
Yiximail
|
afc91edcb2
|
Reset the model_name after unloading the model (#5051)
|
2023-12-22 22:18:24 -03:00 |
|
oobabooga
|
f0f6d9bdf9
|
Add HQQ back & update version
This reverts commit 2289e9031e .
|
2023-12-20 07:46:09 -08:00 |
|
oobabooga
|
fadb295d4d
|
Lint
|
2023-12-19 21:36:57 -08:00 |
|
oobabooga
|
fb8ee9f7ff
|
Add a specific error if HQQ is missing
|
2023-12-19 21:32:58 -08:00 |
|
oobabooga
|
9992f7d8c0
|
Improve several log messages
|
2023-12-19 20:54:32 -08:00 |
|
Water
|
674be9a09a
|
Add HQQ quant loader (#4888)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-12-18 21:23:16 -03:00 |
|
oobabooga
|
3bbf6c601d
|
AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this)
|
2023-12-15 06:46:13 -08:00 |
|
oobabooga
|
39d2fe1ed9
|
Jinja templates for Instruct and Chat (#4874)
|
2023-12-12 17:23:14 -03:00 |
|
oobabooga
|
2a335b8aa7
|
Cleanup: set shared.model_name only once
|
2023-12-08 06:35:23 -08:00 |
|
oobabooga
|
98361af4d5
|
Add QuIP# support (#4803)
It has to be installed manually for now.
|
2023-12-06 00:01:01 -03:00 |
|
oobabooga
|
77d6ccf12b
|
Add a LOADER debug message while loading models
|
2023-11-30 12:00:32 -08:00 |
|
oobabooga
|
8b66d83aa9
|
Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
|
2023-11-16 19:55:28 -08:00 |
|
oobabooga
|
a85ce5f055
|
Add more info messages for truncation / instruction template
|
2023-11-15 16:20:31 -08:00 |
|
oobabooga
|
883701bc40
|
Alternative solution to 025da386a0
Fixes an error.
|
2023-11-15 16:04:02 -08:00 |
|
oobabooga
|
8ac942813c
|
Revert "Fix CPU memory limit error (issue #3763) (#4597)"
This reverts commit 025da386a0 .
|
2023-11-15 16:01:54 -08:00 |
|
oobabooga
|
e6f44d6d19
|
Print context length / instruction template to terminal when loading models
|
2023-11-15 16:00:51 -08:00 |
|
Andy Bao
|
025da386a0
|
Fix CPU memory limit error (issue #3763) (#4597)
get_max_memory_dict() was not properly formatting shared.args.cpu_memory
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-11-15 20:27:20 -03:00 |
|
oobabooga
|
2358706453
|
Add /v1/internal/model/load endpoint (tentative)
|
2023-11-07 20:58:06 -08:00 |
|
oobabooga
|
ec17a5d2b7
|
Make OpenAI API the default API (#4430)
|
2023-11-06 02:38:29 -03:00 |
|
feng lui
|
4766a57352
|
transformers: add use_flash_attention_2 option (#4373)
|
2023-11-04 13:59:33 -03:00 |
|
Julien Chaumond
|
fdcaa955e3
|
transformers: Add a flag to force load from safetensors (#4450)
|
2023-11-02 16:20:54 -03:00 |
|
oobabooga
|
839a87bac8
|
Fix is_ccl_available & is_xpu_available imports
|
2023-10-26 20:27:04 -07:00 |
|
Abhilash Majumder
|
778a010df8
|
Intel Gpu support initialization (#4340)
|
2023-10-26 23:39:51 -03:00 |
|
oobabooga
|
ef1489cd4d
|
Remove unused parameter in AutoAWQ
|
2023-10-23 20:45:43 -07:00 |
|
oobabooga
|
8ea554bc19
|
Check for torch.xpu.is_available()
|
2023-10-16 12:53:40 -07:00 |
|
oobabooga
|
b88b2b74a6
|
Experimental Intel Arc transformers support (untested)
|
2023-10-15 20:51:11 -07:00 |
|