Commit Graph

6 Commits

Author SHA1 Message Date
Johannes Gäßler
cd93a28cb1
CUDA: fix FA out-of-bounds reads (#7479) 2024-05-23 00:31:20 +02:00
Johannes Gäßler
38c03478a3
CUDA: fix FA out-of-bounds writes (#7465) 2024-05-22 17:58:25 +02:00
Georgi Gerganov
9b3d833189
cuda : fix compile warning (#7454) 2024-05-22 12:36:37 +03:00
Johannes Gäßler
95fb0aefab
CUDA: remove incorrect precision check (#7454) 2024-05-22 10:24:29 +02:00
Johannes Gäßler
133d99c599
CUDA: deduplicate FlashAttention code (#7352) 2024-05-18 12:36:25 +02:00
Johannes Gäßler
0fc1e820a9
CUDA: faster large batch FA without tensor cores (#7314) 2024-05-17 18:54:52 +02:00