llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-02-05 08:00:42 +01:00

History

postmasters 83e633c27e llama : differentiate the KV dims in the attention (#4657 ) * Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2024-01-02 13:51:28 +02:00
..
__init__.py	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
constants.py	llama : differentiate the KV dims in the attention (#4657 )	2024-01-02 13:51:28 +02:00
gguf_reader.py	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
gguf_writer.py	llama : differentiate the KV dims in the attention (#4657 )	2024-01-02 13:51:28 +02:00
gguf.py	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
py.typed	convert : various script cleanups/fixes + merges and special token handling (#2842 )	2023-08-30 11:25:50 +03:00
tensor_mapping.py	gpt2 : Add gpt2 architecture integration (#4555 )	2023-12-28 15:03:57 +01:00
vocab.py	py : open merges file as 'utf-8' (#4566 )	2023-12-21 19:07:34 +02:00