From 0f9088f730979a43bf8668034cae16d38c1662bb Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri, 23 Jun 2023 12:24:43 -0300 Subject: [PATCH] Update README --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8d771dad..8199639c 100644 --- a/README.md +++ b/README.md @@ -258,6 +258,7 @@ Optionally, you can use the following command-line flags: | `--triton` | Use triton. | | `--no_inject_fused_attention` | Disable the use of fused attention, which will use less VRAM at the cost of slower inference. | | `--no_inject_fused_mlp` | Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference. | +| `--no_use_cuda_fp16` | This can make models faster on some systems. | | `--desc_act` | For models that don't have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig. | #### ExLlama