From b8183148cf2cf32ec62200da2f9317bed1ea19a9 Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sun, 22 Oct 2023 17:15:55 -0300 Subject: [PATCH] =?UTF-8?q?Update=2004=20=E2=80=90=20Model=20Tab.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/04 ‐ Model Tab.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/04 ‐ Model Tab.md b/docs/04 ‐ Model Tab.md index 2e8c63d2..05046fb3 100644 --- a/docs/04 ‐ Model Tab.md +++ b/docs/04 ‐ Model Tab.md @@ -90,7 +90,7 @@ Example: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF * **threads**: Number of threads. Recommended value: your number of physical cores. * **threads_batch**: Number of threads for batch processing. Recommended value: your total number of cores (physical + virtual). * **n_batch**: Batch size for prompt processing. Higher values are supposed to make generation faster, but I have never obtained any benefit from changing this value. -* **mul_mat_q**: Disable the mul_mat_q kernel. This kernel usually improves generation speed significantly. This option to disable it is included in case it doesn't work on some system. +* **no_mul_mat_q**: Disable the mul_mat_q kernel. This kernel usually improves generation speed significantly. This option to disable it is included in case it doesn't work on some system. * **no-mmap**: Loads the model into memory at once, possibly preventing I/O operations later on at the cost of a longer load time. * **mlock**: Force the system to keep the model in RAM rather than swapping or compressing (no idea what this means, never used it). * **numa**: May improve performance on certain multi-cpu systems.