Updated Feature matrix (markdown)

2024-11-24 09:06:52 +01:00 · 2024-03-05 12:45:31 +00:00 · 2024-03-05 12:45:31 +00:00 · b1c6785367
commit b1c6785367
parent a659cd1217
1 changed files with 10 additions and 4 deletions
--- a/Feature-matrix.md
+++ b/Feature-matrix.md
@ -1,11 +1,17 @@
 |                      | **CPU (AVX2)** | **CPU (ARM NEON)** | **Metal** | **cuBLAS** |    **rocBLAS**   | **SYCL** | **CLBlast** | **Vulkan** | **Kompute** |
 |:--------------------:|:--------------:|:------------------:|:---------:|:----------:|:----------------:|:--------:|:-----------:|:----------:|:-----------:|
-| **K-quants**         | ✅              | ✅                  | ✅         | ✅          | ✅                | ✅        | ✅           | ✅          | 🚫           |
+| **K-quants**         | ✅              | ✅                  | ✅         | ✅          | ✅                | ✅        | ✅ 🐢⁵          | ✅ 🐢⁵         | 🚫           |
-| **I-quants**         | ✅ (SLOW)       | ✅ (SLOW)           | ✅ (SLOW)  | ✅          | ✅                | Partial¹        | 🚫           | 🚫          | 🚫           |
+| **I-quants**         | ✅ 🐢⁴       | ✅ 🐢⁴           | ✅ 🐢⁴ | ✅          | ✅                | Partial¹        | 🚫           | 🚫          | 🚫           |
 | **Multi-GPU**        | N/A            | N/A                | N/A       | ✅          | ❓                | 🚫        | ❓           | ✅          | ❓           |
-|  **K cache quants**  | ✅              | ❓                  | ❓         | ✅          | Partial³ (SLOW) | ❓        | ✅           | 🚫          | 🚫           |
+|  **K cache quants**  | ✅              | ❓                  | ❓         | ✅          | Partial³ 🐢 | ❓        | ✅           | 🚫          | 🚫           |
 | **MoE architecture** | ✅              | ❓                  | ✅         | ✅          | ✅                | ❓        | Partial² | 🚫          | 🚫           |
 * ✅: feature works
 * 🚫: feature does not work
 * ❓: unknown, please contribute if you can test it youself
 * 🐢: feature is slow
 * ¹: IQ3_S and IQ1_S, see #5886
 * ²: Only with `-ngl 0`
-* ³: Only `-ctk q8_0`
+* ³: Only `-ctk q8_0`, inference is 50% slower
 * ⁴: Slower than K-quants of comparable size
 * ⁵: Slower than hipBLAS/cuBLAS on similar cards