llama.cpp/fattn-vec-f32-instance-hs128-f16-q4_0.cu at e141ce624af57bdffbaf57014a044eb1d9689230 - llama.cpp - Gitea: Git with a cup of tea

Mirrors/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-10-31 15:10:16 +01:00

Johannes Gäßler 9b596417af

CUDA: quantized KV support for FA vec (#7527 )

* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code

2024-06-01 08:44:14 +02:00

6 lines

178 B

Plaintext

Raw Blame History

 // This file has been autogenerated by generate-variants.py, do not edit manually.
 #include "../fattn-vec-f32.cuh"
 DECL_FATTN_VEC_F32_CASE(128, GGML_TYPE_F16, GGML_TYPE_Q4_0);