mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-27 04:23:06 +01:00
musa : update doc (#9856)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
This commit is contained in:
parent
96776405a1
commit
943d20b411
@ -31,7 +31,7 @@ variety of hardware - locally and in the cloud.
|
|||||||
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
|
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
|
||||||
- AVX, AVX2 and AVX512 support for x86 architectures
|
- AVX, AVX2 and AVX512 support for x86 architectures
|
||||||
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
|
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
|
||||||
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP)
|
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA)
|
||||||
- Vulkan and SYCL backend support
|
- Vulkan and SYCL backend support
|
||||||
- CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
|
- CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
|
||||||
|
|
||||||
@ -413,7 +413,7 @@ Please refer to [Build llama.cpp locally](./docs/build.md)
|
|||||||
| [BLAS](./docs/build.md#blas-build) | All |
|
| [BLAS](./docs/build.md#blas-build) | All |
|
||||||
| [BLIS](./docs/backend/BLIS.md) | All |
|
| [BLIS](./docs/backend/BLIS.md) | All |
|
||||||
| [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
|
| [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
|
||||||
| [MUSA](./docs/build.md#musa) | Moore Threads GPU |
|
| [MUSA](./docs/build.md#musa) | Moore Threads MTT GPU |
|
||||||
| [CUDA](./docs/build.md#cuda) | Nvidia GPU |
|
| [CUDA](./docs/build.md#cuda) | Nvidia GPU |
|
||||||
| [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
|
| [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
|
||||||
| [Vulkan](./docs/build.md#vulkan) | GPU |
|
| [Vulkan](./docs/build.md#vulkan) | GPU |
|
||||||
|
@ -198,6 +198,8 @@ The following compilation options are also available to tweak performance:
|
|||||||
|
|
||||||
### MUSA
|
### MUSA
|
||||||
|
|
||||||
|
This provides GPU acceleration using the MUSA cores of your Moore Threads MTT GPU. Make sure to have the MUSA SDK installed. You can download it from here: [MUSA SDK](https://developer.mthreads.com/sdk/download/musa).
|
||||||
|
|
||||||
- Using `make`:
|
- Using `make`:
|
||||||
```bash
|
```bash
|
||||||
make GGML_MUSA=1
|
make GGML_MUSA=1
|
||||||
@ -209,6 +211,12 @@ The following compilation options are also available to tweak performance:
|
|||||||
cmake --build build --config Release
|
cmake --build build --config Release
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The environment variable [`MUSA_VISIBLE_DEVICES`](https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/programming_guide/Z%E9%99%84%E5%BD%95/) can be used to specify which GPU(s) will be used.
|
||||||
|
|
||||||
|
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted.
|
||||||
|
|
||||||
|
Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet.
|
||||||
|
|
||||||
### hipBLAS
|
### hipBLAS
|
||||||
|
|
||||||
This provides BLAS acceleration on HIP-supported AMD GPUs.
|
This provides BLAS acceleration on HIP-supported AMD GPUs.
|
||||||
|
Loading…
Reference in New Issue
Block a user