llama.cpp/src
Andrei 1c5eba6f8e
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)
* Add attention and final logit softcapping.

* fix

* Add custom add_ functions

* Disable flash attention for Gemma2

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add default value for attention and final logit softcap value

* Add custom kq scaling from Gemma2Attention

* Remove custom pre attention scaling and use computed value instead.

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-29 23:44:08 -04:00
..
CMakeLists.txt llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
llama.cpp llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) 2024-06-29 23:44:08 -04:00
unicode-data.cpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
unicode-data.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
unicode.cpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
unicode.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00