mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 20:40:24 +01:00

History

llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691 )

* Add support for quantizing already quantized models

* Threaded dequantizing and f16 to f32 conversion

* Clean up thread blocks with spares calculation a bit

* Use std::runtime_error exceptions.

2023-06-10 10:59:17 +03:00

CMakeLists.txt

Add git-based build information for better issue tracking (#1232 )

2023-05-01 18:23:47 +02:00

quantize.cpp

llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691 )

2023-06-10 10:59:17 +03:00

README.md

Overhaul the examples structure

2023-03-25 20:26:40 +02:00

README.md

quantize

TODO