From cd9be4c2ba4de3c46e475747d51b207f489591f6 Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue, 16 May 2023 00:49:32 -0300 Subject: [PATCH] Update llama.cpp-models.md --- docs/llama.cpp-models.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/llama.cpp-models.md b/docs/llama.cpp-models.md index 4ed00dca..153f70af 100644 --- a/docs/llama.cpp-models.md +++ b/docs/llama.cpp-models.md @@ -16,11 +16,22 @@ Enabled with the `--n-gpu-layers` parameter. If you have enough VRAM, use a high Note that you need to manually install `llama-cpp-python` with GPU support. To do that: +#### Linux + ``` pip uninstall -y llama-cpp-python CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir ``` +#### Windows + +``` +pip uninstall -y llama-cpp-python +set CMAKE_ARGS="-DLLAMA_CUBLAS=on" +set FORCE_CMAKE=1 +pip install llama-cpp-python --no-cache-dir +``` + Here you can find the different compilation options for OpenBLAS / cuBLAS / CLBlast: https://pypi.org/project/llama-cpp-python/ ## Performance