diff --git a/Feature-matrix.md b/Feature-matrix.md index ba9b732..29e6fd6 100644 --- a/Feature-matrix.md +++ b/Feature-matrix.md @@ -1,11 +1,17 @@ | | **CPU (AVX2)** | **CPU (ARM NEON)** | **Metal** | **cuBLAS** | **rocBLAS** | **SYCL** | **CLBlast** | **Vulkan** | **Kompute** | |:--------------------:|:--------------:|:------------------:|:---------:|:----------:|:----------------:|:--------:|:-----------:|:----------:|:-----------:| -| **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 | -| **I-quants** | ✅ (SLOW) | ✅ (SLOW) | ✅ (SLOW) | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 | +| **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 | +| **I-quants** | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 | | **Multi-GPU** | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ | -| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ (SLOW) | ❓ | ✅ | 🚫 | 🚫 | +| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ 🐢 | ❓ | ✅ | 🚫 | 🚫 | | **MoE architecture** | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 | +* ✅: feature works +* 🚫: feature does not work +* ❓: unknown, please contribute if you can test it youself +* 🐢: feature is slow * ¹: IQ3_S and IQ1_S, see #5886 * ²: Only with `-ngl 0` -* ³: Only `-ctk q8_0` \ No newline at end of file +* ³: Only `-ctk q8_0`, inference is 50% slower +* ⁴: Slower than K-quants of comparable size +* ⁵: Slower than hipBLAS/cuBLAS on similar cards \ No newline at end of file