mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2024-12-24 13:28:59 +01:00
Update README
This commit is contained in:
parent
3ae9af01aa
commit
0f9088f730
@ -258,6 +258,7 @@ Optionally, you can use the following command-line flags:
|
|||||||
| `--triton` | Use triton. |
|
| `--triton` | Use triton. |
|
||||||
| `--no_inject_fused_attention` | Disable the use of fused attention, which will use less VRAM at the cost of slower inference. |
|
| `--no_inject_fused_attention` | Disable the use of fused attention, which will use less VRAM at the cost of slower inference. |
|
||||||
| `--no_inject_fused_mlp` | Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference. |
|
| `--no_inject_fused_mlp` | Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference. |
|
||||||
|
| `--no_use_cuda_fp16` | This can make models faster on some systems. |
|
||||||
| `--desc_act` | For models that don't have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig. |
|
| `--desc_act` | For models that don't have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig. |
|
||||||
|
|
||||||
#### ExLlama
|
#### ExLlama
|
||||||
|
Loading…
Reference in New Issue
Block a user