llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-29 21:34:51 +01:00

History

Andrei 1c5eba6f8e llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 ) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>		2024-06-29 23:44:08 -04:00
..
CMakeLists.txt	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
llama.cpp	llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 )	2024-06-29 23:44:08 -04:00
unicode-data.cpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
unicode-data.h	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
unicode.cpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
unicode.h	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00