llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 05:17:21 +01:00

History

Ștefan-Gabriel Muscalu a94e6ff877

* update: convert-hf-to-gguf.py to support Qwen2-57B-A14B

* fix: QWEN2MOE support for expert_feed_forward_length

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH

n_ff_exp and n_ff_shared_exp are now properly calculated

* update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM

* fix: QWEN2MOE support for expert_feed_forward_length

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH

n_ff_exp and n_ff_shexp are now properly calculated

2024-06-17 21:08:46 +02:00

__init__.py

convert-hf : support direct Q8_0 conversion (#7234 )

2024-05-13 14:10:51 -04:00

constants.py

update: support Qwen2-57B-A14B (#7835 )

2024-06-17 21:08:46 +02:00

gguf_reader.py

gguf-py : fix and simplify quantized shape round-trip (#7483 )

2024-05-25 11:11:48 +10:00

gguf_writer.py

update: support Qwen2-57B-A14B (#7835 )