diff --git a/Feature-matrix.md b/Feature-matrix.md index 212dcb9..d6cb03e 100644 --- a/Feature-matrix.md +++ b/Feature-matrix.md @@ -3,7 +3,7 @@ | **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 | | **I-quants** | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 | | **Multi-GPU** | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ | -| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ 🐢 | ❓ | ✅ | 🚫 | 🚫 | +| **K cache quants** | ✅ | ❓ | ❓ | ✅ 🐢³ | Partial⁶ 🐢³ | ❓ | ✅ | 🚫 | 🚫 | | **MoE architecture** | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 | * ✅: feature works @@ -12,6 +12,7 @@ * 🐢: feature is slow * ¹: IQ3_S and IQ1_S, see #5886 * ²: Only with `-ngl 0` -* ³: Only `-ctk q8_0`, inference is 50% slower +* ³: Inference is 50% slower * ⁴: Slower than K-quants of comparable size -* ⁵: Slower than cuBLAS/rocBLAS on similar cards \ No newline at end of file +* ⁵: Slower than cuBLAS/rocBLAS on similar cards +* ⁶: Only q8_0 and iq4_nl \ No newline at end of file