mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-11-24 00:56:54 +01:00
Updated Feature matrix (markdown)
parent
a659cd1217
commit
b1c6785367
@ -1,11 +1,17 @@
|
||||
| | **CPU (AVX2)** | **CPU (ARM NEON)** | **Metal** | **cuBLAS** | **rocBLAS** | **SYCL** | **CLBlast** | **Vulkan** | **Kompute** |
|
||||
|:--------------------:|:--------------:|:------------------:|:---------:|:----------:|:----------------:|:--------:|:-----------:|:----------:|:-----------:|
|
||||
| **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 |
|
||||
| **I-quants** | ✅ (SLOW) | ✅ (SLOW) | ✅ (SLOW) | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 |
|
||||
| **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 |
|
||||
| **I-quants** | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 |
|
||||
| **Multi-GPU** | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ |
|
||||
| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ (SLOW) | ❓ | ✅ | 🚫 | 🚫 |
|
||||
| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ 🐢 | ❓ | ✅ | 🚫 | 🚫 |
|
||||
| **MoE architecture** | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 |
|
||||
|
||||
* ✅: feature works
|
||||
* 🚫: feature does not work
|
||||
* ❓: unknown, please contribute if you can test it youself
|
||||
* 🐢: feature is slow
|
||||
* ¹: IQ3_S and IQ1_S, see #5886
|
||||
* ²: Only with `-ngl 0`
|
||||
* ³: Only `-ctk q8_0`
|
||||
* ³: Only `-ctk q8_0`, inference is 50% slower
|
||||
* ⁴: Slower than K-quants of comparable size
|
||||
* ⁵: Slower than hipBLAS/cuBLAS on similar cards
|
Loading…
Reference in New Issue
Block a user