llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 05:17:21 +01:00

History

Add support for ArcticForCausalLM (#7020 )

* common : increase max number of experts to 128

* common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn

* gguf-py : add architecture-specific block mappings that override selected general block mappings

* convert-hf : add model conversion support for ArcticForCausalLM

* convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM)

* llama : add inference support for LLM_ARCH_ARCTIC

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

2024-05-24 14:31:13 +02:00

__init__.py

convert-hf : support direct Q8_0 conversion (#7234 )

2024-05-13 14:10:51 -04:00

constants.py

Add support for ArcticForCausalLM (#7020 )

2024-05-24 14:31:13 +02:00

gguf_reader.py

convert-hf : save memory with lazy evaluation (#7075 )

2024-05-08 18:16:38 -04:00

gguf_writer.py

llama : add phi3 128K model support (#7225 )