Johannes Gäßler 9b596417af
CUDA: quantized KV support for FA vec (#7527)
* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code
2024-06-01 08:44:14 +02:00
..
2024-03-09 14:17:11 +02:00
2024-01-29 15:50:50 -05:00