slaren
|
1ca802a3e0
|
parallelize fattn compilation test
|
2024-05-28 01:19:36 +02:00 |
|
Johannes Gäßler
|
f4003cfba1
|
fix nwarps > batch size
|
2024-05-27 19:42:16 +02:00 |
|
Johannes Gäßler
|
3194a01058
|
fix commented-out kernel variants
|
2024-05-27 19:42:16 +02:00 |
|
Johannes Gäßler
|
672244a88b
|
CUDA: quantized KV support for FA vec
|
2024-05-27 19:42:16 +02:00 |
|
Johannes Gäßler
|
cd93a28cb1
|
CUDA: fix FA out-of-bounds reads (#7479)
|
2024-05-23 00:31:20 +02:00 |
|
Johannes Gäßler
|
38c03478a3
|
CUDA: fix FA out-of-bounds writes (#7465)
|
2024-05-22 17:58:25 +02:00 |
|
Johannes Gäßler
|
133d99c599
|
CUDA: deduplicate FlashAttention code (#7352)
|
2024-05-18 12:36:25 +02:00 |
|
Johannes Gäßler
|
0fc1e820a9
|
CUDA: faster large batch FA without tensor cores (#7314)
|
2024-05-17 18:54:52 +02:00 |
|
Johannes Gäßler
|
dc685be466
|
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
|
2024-05-12 19:40:45 +02:00 |
|