llama.cpp/ggml-cuda/template-instances
Johannes Gäßler 7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
2024-06-05 16:53:00 +02:00
..
fattn-vec-f16-instance-hs64-f16-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs64-f16-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs64-f16-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs64-f16-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs64-f16-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs64-f16-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-f16-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-f16-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-f16-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-f16-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-f16-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-f16-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_0-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_0-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_0-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_0-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_0-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_0-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_1-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_1-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_1-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_1-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_1-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q4_1-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_0-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_0-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_0-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_0-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_0-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_0-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_1-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_1-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_1-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_1-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_1-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q5_1-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q8_0-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q8_0-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q8_0-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q8_0-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q8_0-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs128-q8_0-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f16-instance-hs256-f16-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs64-f16-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs64-f16-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs64-f16-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs64-f16-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs64-f16-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs64-f16-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-f16-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-f16-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-f16-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-f16-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-f16-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-f16-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_0-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_0-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_0-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_0-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_0-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_0-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_1-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_1-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_1-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_1-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_1-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q4_1-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_0-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_0-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_0-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_0-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_0-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_0-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_1-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_1-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_1-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_1-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_1-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q5_1-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q8_0-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q8_0-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q8_0-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q8_0-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q8_0-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs128-q8_0-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-vec-f32-instance-hs256-f16-f16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-wmma-f16-instance-kqfloat-cpb16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-wmma-f16-instance-kqfloat-cpb32.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-wmma-f16-instance-kqhalf-cpb8.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-wmma-f16-instance-kqhalf-cpb16.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
fattn-wmma-f16-instance-kqhalf-cpb32.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
generate_cu_files.py CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q2_k.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q3_k.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q4_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q4_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q4_k.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q5_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q5_1.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q5_k.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q6_k.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00
mmq-instance-q8_0.cu CUDA: refactor mmq, dmmv, mmvq (#7716) 2024-06-05 16:53:00 +02:00