text-generation-webui/DeepSpeed.md at 0a17565e53ed1bbe92e4ee4196d71618bc019ccf

Mirrors/text-generation-webui

Fork 0

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2024-11-24 17:06:53 +01:00

oobabooga 7584d46c29

Refactor models.py (#2113 )

2023-05-16 19:52:22 -03:00

923 B

Raw Blame History

An alternative way of reducing the GPU memory usage of models is to use the DeepSpeed ZeRO-3 optimization.

With this, I have been able to load a 6b model (GPT-J 6B) with less than 6GB of VRAM. The speed of text generation is very decent and much better than what would be accomplished with --auto-devices --gpu-memory 6.

As far as I know, DeepSpeed is only available for Linux at the moment.

How to use it

Install DeepSpeed:

conda install -c conda-forge mpi4py mpich
pip install -U deepspeed

Start the web UI replacing python with deepspeed --num_gpus=1 and adding the --deepspeed flag. Example:

deepspeed --num_gpus=1 server.py --deepspeed --chat --model gpt-j-6B

Learn more

For more information, check out this comment by 81300, who came up with the DeepSpeed support in this web UI.

923 B Raw Blame History

How to use it

Learn more

923 B

Raw Blame History