Commit Graph

195 Commits

Author SHA1 Message Date
oobabooga
dfdb6fee22 Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
2024-04-26 09:39:27 -07:00
oobabooga
4094813f8d Lint 2024-04-24 09:53:41 -07:00
Colin
f3c9103e04
Revert walrus operator for params['max_memory'] (#5878) 2024-04-24 01:09:14 -03:00
wangshuai09
fd4e46bce2
Add Ascend NPU support (basic) (#5541) 2024-04-11 18:42:20 -03:00
oobabooga
d02744282b Minor logging change 2024-04-06 18:56:58 -07:00
oobabooga
1bdceea2d4 UI: Focus on the chat input after starting a new chat 2024-04-06 12:57:57 -07:00
oobabooga
1b87844928 Minor fix 2024-04-05 18:43:43 -07:00
oobabooga
6b7f7555fc Logging message to make transformers loader a bit more transparent 2024-04-05 18:40:02 -07:00
oobabooga
308452b783 Bitsandbytes: load preconverted 4bit models without additional flags 2024-04-04 18:10:24 -07:00
oobabooga
d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga
13fe38eb27 Remove specialized code for gpt-4chan 2024-04-04 16:11:47 -07:00
oobabooga
4039999be5 Autodetect llamacpp_HF loader when tokenizer exists 2024-02-16 09:29:26 -08:00
oobabooga
b2b74c83a6 Fix Qwen1.5 in llamacpp_HF 2024-02-15 19:04:19 -08:00
oobabooga
d47182d9d1
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499) 2024-02-14 00:28:51 -03:00
oobabooga
4e34ae0587 Minor logging improvements 2024-02-06 08:22:08 -08:00
oobabooga
8ee3cea7cb Improve some log messages 2024-02-06 06:31:27 -08:00
oobabooga
2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga
cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
sam-ngu
c0bdcee646
added trust_remote_code to deepspeed init loaderClass (#5237) 2024-01-26 11:10:57 -03:00
oobabooga
89e7e107fc Lint 2024-01-09 16:27:50 -08:00
oobabooga
94afa0f9cf Minor style changes 2024-01-01 16:00:22 -08:00
oobabooga
2734ce3e4c
Remove RWKV loader (#5130) 2023-12-31 02:01:40 -03:00
oobabooga
0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga
8e397915c9
Remove --sdp-attention, --xformers flags (#5126) 2023-12-31 01:36:51 -03:00
Yiximail
afc91edcb2
Reset the model_name after unloading the model (#5051) 2023-12-22 22:18:24 -03:00
oobabooga
f0f6d9bdf9 Add HQQ back & update version
This reverts commit 2289e9031e.
2023-12-20 07:46:09 -08:00
oobabooga
fadb295d4d Lint 2023-12-19 21:36:57 -08:00
oobabooga
fb8ee9f7ff Add a specific error if HQQ is missing 2023-12-19 21:32:58 -08:00
oobabooga
9992f7d8c0 Improve several log messages 2023-12-19 20:54:32 -08:00
Water
674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga
3bbf6c601d AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this) 2023-12-15 06:46:13 -08:00
oobabooga
39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga
2a335b8aa7 Cleanup: set shared.model_name only once 2023-12-08 06:35:23 -08:00
oobabooga
98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga
77d6ccf12b Add a LOADER debug message while loading models 2023-11-30 12:00:32 -08:00
oobabooga
8b66d83aa9 Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
2023-11-16 19:55:28 -08:00
oobabooga
a85ce5f055 Add more info messages for truncation / instruction template 2023-11-15 16:20:31 -08:00
oobabooga
883701bc40 Alternative solution to 025da386a0
Fixes an error.
2023-11-15 16:04:02 -08:00
oobabooga
8ac942813c Revert "Fix CPU memory limit error (issue #3763) (#4597)"
This reverts commit 025da386a0.
2023-11-15 16:01:54 -08:00
oobabooga
e6f44d6d19 Print context length / instruction template to terminal when loading models 2023-11-15 16:00:51 -08:00
Andy Bao
025da386a0
Fix CPU memory limit error (issue #3763) (#4597)
get_max_memory_dict() was not properly formatting shared.args.cpu_memory

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-11-15 20:27:20 -03:00
oobabooga
2358706453 Add /v1/internal/model/load endpoint (tentative) 2023-11-07 20:58:06 -08:00
oobabooga
ec17a5d2b7
Make OpenAI API the default API (#4430) 2023-11-06 02:38:29 -03:00
feng lui
4766a57352
transformers: add use_flash_attention_2 option (#4373) 2023-11-04 13:59:33 -03:00
Julien Chaumond
fdcaa955e3
transformers: Add a flag to force load from safetensors (#4450) 2023-11-02 16:20:54 -03:00
oobabooga
839a87bac8 Fix is_ccl_available & is_xpu_available imports 2023-10-26 20:27:04 -07:00
Abhilash Majumder
778a010df8
Intel Gpu support initialization (#4340) 2023-10-26 23:39:51 -03:00
oobabooga
ef1489cd4d Remove unused parameter in AutoAWQ 2023-10-23 20:45:43 -07:00
oobabooga
8ea554bc19 Check for torch.xpu.is_available() 2023-10-16 12:53:40 -07:00
oobabooga
b88b2b74a6 Experimental Intel Arc transformers support (untested) 2023-10-15 20:51:11 -07:00