oobabooga
104293f411
Add LoRA support
2023-03-16 21:31:39 -03:00
oobabooga
ee164d1821
Don't split the layers in 8-bit mode by default
2023-03-16 18:22:16 -03:00
oobabooga
e085cb4333
Small changes
2023-03-16 13:34:23 -03:00
awoo
83cb20aad8
Add support for --gpu-memory witn --load-in-8bit
2023-03-16 18:42:53 +03:00
oobabooga
1c378965e1
Remove unused imports
2023-03-16 10:18:34 -03:00
oobabooga
66256ac1dd
Make the "no GPU has been detected" message more descriptive
2023-03-15 19:31:27 -03:00
oobabooga
265ba384b7
Rename a file, add deprecation warning for --load-in-4bit
2023-03-14 07:56:31 -03:00
Ayanami Rei
8778b756e6
use updated load_quantized
2023-03-13 22:11:40 +03:00
Ayanami Rei
e1c952c41c
make argument non case-sensitive
2023-03-13 20:22:38 +03:00
Ayanami Rei
3c9afd5ca3
rename method
2023-03-13 20:14:40 +03:00
Ayanami Rei
edbc61139f
use new quant loader
2023-03-13 20:00:38 +03:00
oobabooga
65dda28c9d
Rename --llama-bits to --gptq-bits
2023-03-12 11:19:07 -03:00
oobabooga
fed3617f07
Move LLaMA 4-bit into a separate file
2023-03-12 11:12:34 -03:00
draff
001e638b47
Make it actually work
2023-03-10 23:28:19 +00:00
draff
804486214b
Re-implement --load-in-4bit and update --llama-bits arg description
2023-03-10 23:21:01 +00:00
ItsLogic
9ba8156a70
remove unnecessary Path()
2023-03-10 22:33:58 +00:00
draff
e6c631aea4
Replace --load-in-4bit with --llama-bits
...
Replaces --load-in-4bit with a more flexible --llama-bits arg to allow for 2 and 3 bit models as well. This commit also fixes a loading issue with .pt files which are not in the root of the models folder
2023-03-10 21:36:45 +00:00
oobabooga
e9dbdafb14
Merge branch 'main' into pt-path-changes
2023-03-10 11:03:42 -03:00
oobabooga
706a03b2cb
Minor changes
2023-03-10 11:02:25 -03:00
oobabooga
de7dd8b6aa
Add comments
2023-03-10 10:54:08 -03:00
oobabooga
e461c0b7a0
Move the import to the top
2023-03-10 10:51:12 -03:00
deepdiffuser
9fbd60bf22
add no_split_module_classes to prevent tensor split error
2023-03-10 05:30:47 -08:00
deepdiffuser
ab47044459
add multi-gpu support for 4bit gptq LLaMA
2023-03-10 04:52:45 -08:00
rohvani
2ac2913747
fix reference issue
2023-03-09 20:13:23 -08:00
rohvani
826e297b0e
add llama-65b-4bit support & multiple pt paths
2023-03-09 18:31:32 -08:00
oobabooga
9849aac0f1
Don't show .pt models in the list
2023-03-09 21:54:50 -03:00
oobabooga
74102d5ee4
Insert to the path instead of appending
2023-03-09 20:51:22 -03:00
oobabooga
2965aa1625
Check if the .pt file exists
2023-03-09 20:48:51 -03:00
oobabooga
828a524f9a
Add LLaMA 4-bit support
2023-03-09 15:50:26 -03:00
oobabooga
e91f4bc25a
Add RWKV tokenizer
2023-03-06 08:45:49 -03:00
oobabooga
c33715ad5b
Move towards HF LLaMA implementation
2023-03-05 01:20:31 -03:00
oobabooga
bd8aac8fa4
Add LLaMA 8-bit support
2023-03-04 13:28:42 -03:00
oobabooga
ed8b35efd2
Add --pin-weight parameter for FlexGen
2023-03-04 01:04:02 -03:00
oobabooga
ea5c5eb3da
Add LLaMA support
2023-03-03 14:39:14 -03:00
oobabooga
659bb76722
Add RWKVModel class
2023-03-01 12:08:55 -03:00
oobabooga
6837d4d72a
Load the model by name
2023-02-28 02:52:29 -03:00
oobabooga
70e522732c
Move RWKV loader into a separate file
2023-02-27 23:50:16 -03:00
oobabooga
ebc64a408c
RWKV support prototype
2023-02-27 23:03:35 -03:00
oobabooga
8e3e8a070f
Make FlexGen work with the newest API
2023-02-26 16:53:41 -03:00
oobabooga
65326b545a
Move all gradio elements to shared (so that extensions can use them)
2023-02-24 16:46:50 -03:00
oobabooga
f6f792363b
Separate command-line params by spaces instead of commas
2023-02-24 08:55:09 -03:00
luis
5abdc99a7c
gpu-memory arg change
2023-02-23 18:43:55 -05:00
oobabooga
7224343a70
Improve the imports
2023-02-23 14:41:42 -03:00
oobabooga
e46c43afa6
Move some stuff from server.py to modules
2023-02-23 13:42:23 -03:00
oobabooga
1dacd34165
Further refactor
2023-02-23 13:28:30 -03:00