From 47666c4d00275a09962d6177fed3a6a3a2be11fe Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat, 22 Apr 2023 15:12:14 -0300 Subject: [PATCH] Update GPTQ-models-(4-bit-mode).md --- docs/GPTQ-models-(4-bit-mode).md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/GPTQ-models-(4-bit-mode).md b/docs/GPTQ-models-(4-bit-mode).md index f8429e6a..679cabee 100644 --- a/docs/GPTQ-models-(4-bit-mode).md +++ b/docs/GPTQ-models-(4-bit-mode).md @@ -11,7 +11,7 @@ Different branches of GPTQ-for-LLaMa are available: | Branch | Comment | |----|----| | [Old CUDA branch (recommended)](https://github.com/oobabooga/GPTQ-for-LLaMa/) | The fastest branch, works on Windows and Linux. | -| [Up-to-date triton branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa) | Slightly more precise than the old CUDA branch, 2x slower for small context size, only works on Linux. | +| [Up-to-date triton branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa) | Slightly more precise than the old CUDA branch from 13b upwards, significantly more precise for 7b. 2x slower for small context size and only works on Linux. | | [Up-to-date CUDA branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda) | As precise as the up-to-date triton branch, 10x slower than the old cuda branch for small context size. | Overall, I recommend using the old CUDA branch. It is included by default in the one-click-installer for this web UI.