mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2024-11-26 01:30:20 +01:00
Update GPTQ-models-(4-bit-mode).md
This commit is contained in:
parent
fcb594b90e
commit
47666c4d00
@ -11,7 +11,7 @@ Different branches of GPTQ-for-LLaMa are available:
|
|||||||
| Branch | Comment |
|
| Branch | Comment |
|
||||||
|----|----|
|
|----|----|
|
||||||
| [Old CUDA branch (recommended)](https://github.com/oobabooga/GPTQ-for-LLaMa/) | The fastest branch, works on Windows and Linux. |
|
| [Old CUDA branch (recommended)](https://github.com/oobabooga/GPTQ-for-LLaMa/) | The fastest branch, works on Windows and Linux. |
|
||||||
| [Up-to-date triton branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa) | Slightly more precise than the old CUDA branch, 2x slower for small context size, only works on Linux. |
|
| [Up-to-date triton branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa) | Slightly more precise than the old CUDA branch from 13b upwards, significantly more precise for 7b. 2x slower for small context size and only works on Linux. |
|
||||||
| [Up-to-date CUDA branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda) | As precise as the up-to-date triton branch, 10x slower than the old cuda branch for small context size. |
|
| [Up-to-date CUDA branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda) | As precise as the up-to-date triton branch, 10x slower than the old cuda branch for small context size. |
|
||||||
|
|
||||||
Overall, I recommend using the old CUDA branch. It is included by default in the one-click-installer for this web UI.
|
Overall, I recommend using the old CUDA branch. It is included by default in the one-click-installer for this web UI.
|
||||||
|
Loading…
Reference in New Issue
Block a user