slaren
|
1ca802a3e0
|
parallelize fattn compilation test
|
2024-05-28 01:19:36 +02:00 |
|
Johannes Gäßler
|
462add6a01
|
try CI fix
|
2024-05-27 19:42:16 +02:00 |
|
Johannes Gäßler
|
672244a88b
|
CUDA: quantized KV support for FA vec
|
2024-05-27 19:42:16 +02:00 |
|
Johannes Gäßler
|
133d99c599
|
CUDA: deduplicate FlashAttention code (#7352)
|
2024-05-18 12:36:25 +02:00 |
|
Johannes Gäßler
|
dc685be466
|
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
|
2024-05-12 19:40:45 +02:00 |
|