mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-01 07:30:17 +01:00

* Update README.md

* Update README.md

* Update README.md with k-quants bpw measurements

2023-09-27 18:30:36 +03:00

perplexity

TODO

Llama 2 70B Scorechart

Quantization	Model size (GiB)	Perplexity	Delta to fp16
Q4_0	36.20	3.5550	3.61%
Q4_1	40.20	3.5125	2.37%
Q5_0	44.20	3.4744	1.26%
Q2_K	27.27	3.7339	8.82%
Q3_K_S	27.86	3.7019	7.89%
Q3_K_M	30.83	3.5932	4.72%
Q3_K_L	33.67	3.5617	3.80%
Q4_K_S	36.39	3.4852	1.57%
Q4_K_M	38.54	3.4725	1.20%
Q5_K_S	44.20	3.4483	0.50%
Q5_K_M	45.41	3.4451	0.40%
Q6_K	52.70	3.4367	0.16%
fp16	128.5	3.4313	-