Updated Feature matrix (markdown)

Romain D 2024-03-21 17:22:48 +00:00
parent e4f77876f9
commit 3532deb5dc

@ -3,7 +3,7 @@
| **K-quants** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 |
| **I-quants** | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 |
| **Multi-GPU** | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ |
| **K cache quants** | ✅ | ❓ | ❓ | ✅ | Partial³ 🐢 | ❓ | ✅ | 🚫 | 🚫 |
| **K cache quants** | ✅ | ❓ | ❓ | ✅ 🐢³ | Partial⁶ 🐢³ | ❓ | ✅ | 🚫 | 🚫 |
| **MoE architecture** | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 |
* ✅: feature works
@ -12,6 +12,7 @@
* 🐢: feature is slow
* ¹: IQ3_S and IQ1_S, see #5886
* ²: Only with `-ngl 0`
* ³: Only `-ctk q8_0`, inference is 50% slower
* ³: Inference is 50% slower
* ⁴: Slower than K-quants of comparable size
* ⁵: Slower than cuBLAS/rocBLAS on similar cards
* ⁶: Only q8_0 and iq4_nl