Maya Eary
|
c8207d474f
|
Generalized load_quantized
|
2023-03-28 20:38:55 +03:00 |
|
oobabooga
|
49c10c5570
|
Add support for the latest GPTQ models with group-size (#530)
**Warning: old 4-bit weights will not work anymore!**
See here how to get up to date weights: https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#step-2-get-the-pre-converted-weights
|
2023-03-26 00:11:33 -03:00 |
|
EyeDeck
|
dcfd866402
|
Allow loading of .safetensors through GPTQ-for-LLaMa
|
2023-03-23 21:31:34 -04:00 |
|
oobabooga
|
db4219a340
|
Update comments
|
2023-03-20 16:40:08 -03:00 |
|
oobabooga
|
7618f3fe8c
|
Add -gptq-preload for 4-bit offloading (#460)
This works in a 4GB card now:
```
python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20
```
|
2023-03-20 16:30:56 -03:00 |
|
oobabooga
|
9a3bed50c3
|
Attempt at fixing 4-bit with CPU offload
|
2023-03-20 15:11:56 -03:00 |
|
askmyteapot
|
53b6a66beb
|
Update GPTQ_Loader.py
Correcting decoder layer for renamed class.
|
2023-03-17 18:34:13 +10:00 |
|
oobabooga
|
265ba384b7
|
Rename a file, add deprecation warning for --load-in-4bit
|
2023-03-14 07:56:31 -03:00 |
|