Update README

2025-01-23 10:09:20 +01:00 · 2023-06-23 12:24:43 -03:00 · 2023-06-23 12:24:43 -03:00 · 0f9088f730
commit 0f9088f730
parent 3ae9af01aa
1 changed files with 1 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -258,6 +258,7 @@ Optionally, you can use the following command-line flags:
 | `--triton`                     | Use triton. |
 | `--no_inject_fused_attention`  | Disable the use of fused attention, which will use less VRAM at the cost of slower inference. |
 | `--no_inject_fused_mlp`        | Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference. |
+| `--no_use_cuda_fp16`           | This can make models faster on some systems. |
 | `--desc_act`                   | For models that don't have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig. |

 #### ExLlama