llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

Johannes Gäßler e11bd856d5

CPU/CUDA: Gemma 2 FlashAttention support (#8542 )

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

2024-08-24 21:34:59 +02:00

cmake

llama : reorganize source code + improve CMake (#8006 )

2024-06-26 18:33:02 +03:00

include

CPU/CUDA: Gemma 2 FlashAttention support (#8542 )

2024-08-24 21:34:59 +02:00

src

CPU/CUDA: Gemma 2 FlashAttention support (#8542 )

2024-08-24 21:34:59 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

Vulkan Optimizations and Fixes (#8959 )

2024-08-14 18:32:53 +02:00