llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 13:27:21 +01:00

History

llama : add support for BitnetForCausalLM (#7931 )

* hf bitnet v1

* hf bitnet e2e v2

* finish bitnet e2e

* finish f16 hf bitnet e2e

* remove unsed

* finish bitnet i2 e2e

* move i2s to quantize v1

* move i2 to quantize

* clean code

* clean code 2

* fix codestyle

* fix code

* fix

* fix code

* fix merge

* remove unused

* change table name

* fix whitespace

* delete redundant

* i2_s to absmax

* finish i2_s/i8_s vec_dot x86 simd

* i2s->q22

* fix code

* remove block scale

* add dequantize

* fix seq

* update avx2

* remove q2_2

* remove q22_grid

* fix whitespace

* reuse llm_build_kv

* fix bo

---------

Co-authored-by: root <root@wangjinheng>

2024-06-23 21:27:57 +03:00

__init__.py

convert-hf : support direct Q8_0 conversion (#7234 )

2024-05-13 14:10:51 -04:00

constants.py

llama : add support for BitnetForCausalLM (#7931 )

2024-06-23 21:27:57 +03:00

gguf_reader.py

gguf-py : fix and simplify quantized shape round-trip (#7483 )

2024-05-25 11:11:48 +10:00

gguf_writer.py

update: support Qwen2-57B-A14B (#7835 )