Johannes Gäßler 1f0dabda8d
CUDA: use tensor cores for MMQ (#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-10 11:45:13 +02:00
..
2024-06-05 16:53:00 +02:00
2024-03-29 17:45:46 +02:00
2024-04-30 12:16:08 +03:00
2024-06-10 11:45:13 +02:00
2024-06-10 11:45:13 +02:00
2024-06-05 16:53:00 +02:00
2024-06-05 11:29:20 +03:00