mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-11-25 17:39:23 +01:00
Updated Tensor Encoding Schemes (markdown)
parent
c5ee522ea7
commit
584c3525b5
@ -127,7 +127,5 @@ The 12 bytes in Q4_K `.scales` are packed a bit like this, where the uppercased
|
|||||||
11: hhhhHHHH
|
11: hhhhHHHH
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that this is packing a 6bit scale and mins but split across multiple bytes. This use of byte offsets and bitwise operations is likely done to be more friendlier to parallel processing.
|
Note that this is packing a 6bit scale and mins but split across multiple bytes. This use of byte offsets and bitwise operations is likely done to be more friendlier for SIMD processing. As [compilade](https://github.com/compilade) noted, he believes that the indexing is only done at the byte level, hence the packing and unpacking of the 6-bit values in this block will require bitwise operations. In his anecdotal experience he also noticed that when making the vec_dot of Q1_3, that shuffles are surprisingly as fast as additions in SIMD.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user