llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

llama : differentiate the KV dims in the attention (#4657 )

* Add n_key_dim and n_value_dim

Some models use values that are not derived from `n_embd`.
Also remove `n_embd_head` and `n_embd_gqa` because it is not clear
which "head" is referred to (key or value).

Fix issue #4648.

* Fix `llm_build_kqv` to use `n_value_gqa`

* Rebase

* Rename variables

* Fix llm_build_kqv to be more generic wrt n_embd_head_k

* Update default values for n_embd_head_k and n_embd_head_v

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix llm_load_tensors: the asserts were not backcompat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2024-01-02 13:51:28 +02:00

__init__.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

constants.py

llama : differentiate the KV dims in the attention (#4657 )

2024-01-02 13:51:28 +02:00

gguf_reader.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

gguf_writer.py

llama : differentiate the KV dims in the attention (#4657 )