llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

llama : add gemma model (#5631 )

There are couple things in this architecture:

1. Shared input and output embedding parameters.
2. Key length and value length are not derived from `n_embd`.

More information about the models can be found at
https://ai.google.dev/gemma. GGUFs can be downloaded from
https://huggingface.co/google.

2024-02-21 15:08:22 +02:00

__init__.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

constants.py

llama : add gemma model (#5631 )

2024-02-21 15:08:22 +02:00

gguf_reader.py

gguf : fix "general.alignment" type in gguf_reader.py (#5136 )

2024-01-26 11:10:28 +02:00

gguf_writer.py

Use correct type of pooling for embedding models (#5500 )