mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-26 14:20:31 +01:00
1c5eba6f8e
* Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com> |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
llama.cpp | ||
unicode-data.cpp | ||
unicode-data.h | ||
unicode.cpp | ||
unicode.h |