llama.cpp/ggml-cuda/template-instances
Johannes Gäßler 9b596417af
CUDA: quantized KV support for FA vec (#7527)
* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code
2024-06-01 08:44:14 +02:00
..