Kawrakow bd2d4e393b
1.5 bit quantization (#5453)
* iq1_s: WIP basics

* iq1_s: CUDA is working

* iq1_s: scalar CPU dot product

* iq1_s: WIP AVX2 dot product - something is not right

* Fix tests

* Fix shadow warnings

* Fix after merge with latest master

* iq1_s: AVX2 finally works

* iq1_s: ARM_NEON dot product. Works, but not very fast

* iq1_s: better grid

* iq1_s: use IQ2_XXS for attn_output

At a cost of 0.04 extra bpw this gives a big improvement in PPL.

* iq1_s: Metal basics

Dequantize works, but not dot product

* iq1_s: Metal works, but quite slow

As usual, Apple Silicon does not like the code I write.

* iq1_s: Tests

* iq1_s: slightly faster dot product

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-18 18:16:55 +02:00
..
2024-02-16 11:31:07 +02:00
2024-01-14 09:45:56 +02:00
2024-02-16 11:31:07 +02:00
2023-12-21 23:08:14 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-18 18:16:55 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2023-03-29 20:21:09 +03:00
2023-08-30 09:29:32 +03:00