From 3532deb5dcc4a17ce2cd916d66b84875b5e604c3 Mon Sep 17 00:00:00 2001 From: Romain D <90720+Artefact2@users.noreply.github.com> Date: Thu, 21 Mar 2024 17:22:48 +0000 Subject: [PATCH] Updated Feature matrix (markdown) --- Feature-matrix.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Feature-matrix.md b/Feature-matrix.md index 212dcb9..d6cb03e 100644 --- a/Feature-matrix.md +++ b/Feature-matrix.md @@ -3,7 +3,7 @@ | **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 | | **I-quants** | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 | | **Multi-GPU** | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ | -| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ 🐢 | ❓ | ✅ | 🚫 | 🚫 | +| **K cache quants** | ✅ | ❓ | ❓ | ✅ 🐢³ | Partial⁶ 🐢³ | ❓ | ✅ | 🚫 | 🚫 | | **MoE architecture** | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 | * ✅: feature works @@ -12,6 +12,7 @@ * 🐢: feature is slow * ¹: IQ3_S and IQ1_S, see #5886 * ²: Only with `-ngl 0` -* ³: Only `-ctk q8_0`, inference is 50% slower +* ³: Inference is 50% slower * ⁴: Slower than K-quants of comparable size -* ⁵: Slower than cuBLAS/rocBLAS on similar cards \ No newline at end of file +* ⁵: Slower than cuBLAS/rocBLAS on similar cards +* ⁶: Only q8_0 and iq4_nl \ No newline at end of file