From 57985f88ca7e31a1367af4220b544a123eba1272 Mon Sep 17 00:00:00 2001
From: brian khuu <mofosyne@gmail.com>
Date: Tue, 11 Jun 2024 23:04:09 +1000
Subject: [PATCH] add dev notes

---
 Tensor-Encoding-Schemes.md | 64 +++++++++++++++++++-------------------
 _Sidebar.md                |  3 +-
 dev-notes.md               | 47 ++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 33 deletions(-)
 create mode 100644 dev-notes.md

diff --git a/Tensor-Encoding-Schemes.md b/Tensor-Encoding-Schemes.md
index e353927..a3ca17a 100644
--- a/Tensor-Encoding-Schemes.md
+++ b/Tensor-Encoding-Schemes.md
@@ -27,38 +27,38 @@ This is not definitive, but is helpful when reading sourcecode or console output
 
 ## Tensor Encoding Scheme Mapping
 
-| Scheme   | `ggml_ftype` C enumeration name | `ggml_type` C enum name | Bits/Weight | Data Type                     | Block Configuration                                                    | Quantized Weight Formula                        | Initial Commits Or Pull Request Sources (of `ggml_type`)                 |
-| -------- | ------------------------------- | ----------------------- | ----------- | ----------------------------- | ---------------------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------------------------ |
-| BF16     | GGML_FTYPE_MOSTLY_BF16          | GGML_TYPE_BF16          | 16          | bfloat16 (trunc 32b IEEE754)  | Homogonous Array Of Floating Weights                                   | -                                               | [llama.cpp PR: Introduce bfloat16 support #6412](https://github.com/ggerganov/llama.cpp/pull/6412) |
-| F16      | GGML_FTYPE_MOSTLY_F16           | GGML_TYPE_F16           | 16          | 16-bit IEEE 754               | Homogonous Array Of Floating Weights                                   | -                                               | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) |
-| F32      | GGML_FTYPE_ALL_F32              | GGML_TYPE_F32           | 32          | 32-bit IEEE 754               | Homogonous Array Of Floating Weights                                   | -                                               | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) |
-| F64      | -                               | GGML_TYPE_F64           | 64          | 64-bit IEEE 754               | Homogonous Array Of Floating Weights                                   | -                                               | [llama.cpp CM: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062) |
-| I8       | -                               | GGML_TYPE_I8            | 8           | (signed?) integer             | -                                                                      | -                                               | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) |
-| I16      | -                               | GGML_TYPE_I16           | 16          | (signed?) integer             | -                                                                      | -                                               | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) |
-| I32      | -                               | GGML_TYPE_I32           | 32          | (signed?) integer             | -                                                                      | -                                               | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) |
-| I64      | -                               | GGML_TYPE_I64           | 64          | (signed?) integer             | -                                                                      | -                                               | [llama.cpp PR: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062) |
-| Q4_0     | GGML_FTYPE_MOSTLY_Q4_0          | GGML_TYPE_Q4_0          | 4           | round to nearest quantization | Each block has 32 weights                                              | w = q * block_scale                             | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) |
-| Q4_1     | GGML_FTYPE_MOSTLY_Q4_1          | GGML_TYPE_Q4_1          | 4           | round to nearest quantization | Each block has 32 weights                                              | w = q * block_scale + block_minimum             | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) |
-| Q4_1_F16 | GGML_FTYPE_MOSTLY_Q4_1_SOME_F16 | -                       | 4           | round to nearest quantization | Each block has 32 weights (token embedding and output weights are F16) | w = q * block_scale + block_minimum             | [llama.cpp CM: add Q5 WASM SIMD + GGML_FTYPE](https://github.com/ggerganov/llama.cpp/commit/6bc4400e67e6bc4faad3ad3d5e9d8a6576a9752d) |
-| Q8_0     | GGML_FTYPE_MOSTLY_Q8_0          | GGML_TYPE_Q8_0          | 8           | round to nearest quantization | Each block has 32 weights                                              | w = q * block_scale                             | [llama.cpp PR: Add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) #1179](https://github.com/ggerganov/llama.cpp/pull/1179) |
-| Q8_1     | -                               | GGML_TYPE_Q8_1          | 8           | round to nearest quantization | Each block has 32 weights                                              | w = q * block_scale + block_minimum             | [llama.cpp PR: Add Q8_0 quantization for intermediate results #951 (Note: Renamed to Q8_1 in later commit)](https://github.com/ggerganov/llama.cpp/pull/951) |
-| Q5_0     | GGML_FTYPE_MOSTLY_Q5_0          | GGML_TYPE_Q5_0          | 5           | round to nearest quantization | Each block has 32 weights                                              | w = q * block_scale                             | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187) |
-| Q5_1     | GGML_FTYPE_MOSTLY_Q5_1          | GGML_TYPE_Q5_1          | 5           | round to nearest quantization | Each block has 32 weights                                              | w = q * block_scale + block_minimum             | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187) |
-| Q2_K     | GGML_FTYPE_MOSTLY_Q2_K          | GGML_TYPE_Q2_K          | 2.5625      | k-quantization                | Superblocks has 16 blocks ( 16 weights per block)                      | w = q * block_scale (4-bit) + block_min (4-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) |
-| Q3_K     | GGML_FTYPE_MOSTLY_Q3_K          | GGML_TYPE_Q3_K          | 3.4375      | k-quantization                | Superblocks has 16 blocks ( 16 weights per block)                      | w = q * block_scale (6-bit)                     | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) |
-| Q4_K     | GGML_FTYPE_MOSTLY_Q4_K          | GGML_TYPE_Q4_K          | 4.5         | k-quantization                | Superblocks has  8 blocks ( 32 weights per block)                      | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) |
-| Q5_K     | GGML_FTYPE_MOSTLY_Q5_K          | GGML_TYPE_Q5_K          | 5.5         | k-quantization                | Superblocks has  8 blocks ( 32 weights per block)                      | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) |
-| Q6_K     | GGML_FTYPE_MOSTLY_Q6_K          | GGML_TYPE_Q6_K          | 6.5625      | k-quantization                | Superblocks has 16 blocks ( 16 weights per block)                      | w = q * block_scale (8-bit)                     | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) |
-| Q8_K     | -                               | GGML_TYPE_Q8_K          | 8.0         | k-quantization                | Superblocks has  1 blocks (256 weights per block) (Only used for intermediate quants) | w = q * block_scale (8-bit)      | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) |
-| IQ1_S    | GGML_FTYPE_MOSTLY_IQ1_S         | GGML_TYPE_IQ1_S         | 1.5         | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: 1.5 bit quantization #5453](https://github.com/ggerganov/llama.cpp/pull/5453) |
-| IQ1_M    | GGML_FTYPE_MOSTLY_IQ1_M         | GGML_TYPE_IQ1_M         | 1.75        | i-quantization                | Superblocks has 16 blocks ( 16 weights per block)                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ1_M: 1.75 bpw quantization #6302](https://github.com/ggerganov/llama.cpp/pull/6302) |
-| IQ2_XXS  | GGML_FTYPE_MOSTLY_IQ2_XXS       | GGML_TYPE_IQ2_XXS       | 2.0625      | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: SOTA 2-bit quants #4773](https://github.com/ggerganov/llama.cpp/pull/4773) |
-| IQ2_XS   | GGML_FTYPE_MOSTLY_IQ2_XS        | GGML_TYPE_IQ2_XS        | 2.31        | i-quantization                | Superblocks has 16 blocks ( 16 weights per block)                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: SOTA 2-bit quants - part 2 #4856](https://github.com/ggerganov/llama.cpp/pull/4856) |
-| IQ2_S    | GGML_FTYPE_MOSTLY_IQ2_S         | GGML_TYPE_IQ2_S         | 2.5         | i-quantization                | ?                                                                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range #5721](https://github.com/ggerganov/llama.cpp/pull/5721) |
-| IQ3_S    | GGML_FTYPE_MOSTLY_IQ3_S         | GGML_TYPE_IQ3_S         | 3.4375      | i-quantization                | ?                                                                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ3_S: a much better alternative to Q3_K #5676](https://github.com/ggerganov/llama.cpp/pull/5676) |
-| IQ3_XXS  | GGML_FTYPE_MOSTLY_IQ3_XXS       | GGML_TYPE_IQ3_XXS       | 3.0625      | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: SOTA 3-bit quants #5196](https://github.com/ggerganov/llama.cpp/pull/5196) |
-| IQ4_NL   | GGML_FTYPE_MOSTLY_IQ4_NL        | GGML_TYPE_IQ4_NL        | 4.5         | i-quantization                | Superblocks has 16 blocks ( 16 weights per block)                      | w = [non linear mapping of quants to weights]   | [llama.cpp PR: IQ4_NL: 4-bit non-linear quants with blocks of 32 #5590](https://github.com/ggerganov/llama.cpp/pull/5590) |
-| IQ4_XS   | GGML_FTYPE_MOSTLY_IQ4_XS        | GGML_TYPE_IQ4_XS        | 4.25        | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                      | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ4_XS: a 4.25 bpw quantization #5747](https://github.com/ggerganov/llama.cpp/pull/5747) |
+| Scheme   | `ggml_ftype` C enumeration name | `ggml_type` C enum name | Bits/Weight | Data Type                     | Block Configuration                                                                   | Quantized Weight Formula                        | Initial Commits Or Pull Request Sources (of `ggml_type`)                                                                                                     |
+|----------|---------------------------------|-------------------------|-------------|-------------------------------|---------------------------------------------------------------------------------------|-------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| BF16     | GGML_FTYPE_MOSTLY_BF16          | GGML_TYPE_BF16          | 16          | bfloat16 (trunc 32b IEEE754)  | Homogonous Array Of Floating Weights                                                  | -                                               | [llama.cpp PR: Introduce bfloat16 support #6412](https://github.com/ggerganov/llama.cpp/pull/6412)                                                           |
+| F16      | GGML_FTYPE_MOSTLY_F16           | GGML_TYPE_F16           | 16          | 16-bit IEEE 754               | Homogonous Array Of Floating Weights                                                  | -                                               | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257)                                      |
+| F32      | GGML_FTYPE_ALL_F32              | GGML_TYPE_F32           | 32          | 32-bit IEEE 754               | Homogonous Array Of Floating Weights                                                  | -                                               | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257)                                      |
+| F64      | -                               | GGML_TYPE_F64           | 64          | 64-bit IEEE 754               | Homogonous Array Of Floating Weights                                                  | -                                               | [llama.cpp CM: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062)                                                   |
+| I8       | -                               | GGML_TYPE_I8            | 8           | (signed?) integer             | -                                                                                     | -                                               | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050)                                                |
+| I16      | -                               | GGML_TYPE_I16           | 16          | (signed?) integer             | -                                                                                     | -                                               | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050)                                                |
+| I32      | -                               | GGML_TYPE_I32           | 32          | (signed?) integer             | -                                                                                     | -                                               | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050)                                                |
+| I64      | -                               | GGML_TYPE_I64           | 64          | (signed?) integer             | -                                                                                     | -                                               | [llama.cpp PR: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062)                                                   |
+| Q4_0     | GGML_FTYPE_MOSTLY_Q4_0          | GGML_TYPE_Q4_0          | 4           | round to nearest quantization | Each block has 32 weights                                                             | w = q * block_scale                             | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257)                                      |
+| Q4_1     | GGML_FTYPE_MOSTLY_Q4_1          | GGML_TYPE_Q4_1          | 4           | round to nearest quantization | Each block has 32 weights                                                             | w = q * block_scale + block_minimum             | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257)                                      |
+| Q4_1_F16 | GGML_FTYPE_MOSTLY_Q4_1_SOME_F16 | -                       | 4           | round to nearest quantization | Each block has 32 weights (token embedding and output weights are F16)                | w = q * block_scale + block_minimum             | [llama.cpp CM: add Q5 WASM SIMD + GGML_FTYPE](https://github.com/ggerganov/llama.cpp/commit/6bc4400e67e6bc4faad3ad3d5e9d8a6576a9752d)                        |
+| Q8_0     | GGML_FTYPE_MOSTLY_Q8_0          | GGML_TYPE_Q8_0          | 8           | round to nearest quantization | Each block has 32 weights                                                             | w = q * block_scale                             | [llama.cpp PR: Add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) #1179](https://github.com/ggerganov/llama.cpp/pull/1179)                 |
+| Q8_1     | -                               | GGML_TYPE_Q8_1          | 8           | round to nearest quantization | Each block has 32 weights                                                             | w = q * block_scale + block_minimum             | [llama.cpp PR: Add Q8_0 quantization for intermediate results #951 (Note: Renamed to Q8_1 in later commit)](https://github.com/ggerganov/llama.cpp/pull/951) |
+| Q5_0     | GGML_FTYPE_MOSTLY_Q5_0          | GGML_TYPE_Q5_0          | 5           | round to nearest quantization | Each block has 32 weights                                                             | w = q * block_scale                             | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187)                                                       |
+| Q5_1     | GGML_FTYPE_MOSTLY_Q5_1          | GGML_TYPE_Q5_1          | 5           | round to nearest quantization | Each block has 32 weights                                                             | w = q * block_scale + block_minimum             | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187)                                                       |
+| Q2_K     | GGML_FTYPE_MOSTLY_Q2_K          | GGML_TYPE_Q2_K          | 2.5625      | k-quantization                | Superblocks has 16 blocks ( 16 weights per block)                                     | w = q * block_scale (4-bit) + block_min (4-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684)                                                                             |
+| Q3_K     | GGML_FTYPE_MOSTLY_Q3_K          | GGML_TYPE_Q3_K          | 3.4375      | k-quantization                | Superblocks has 16 blocks ( 16 weights per block)                                     | w = q * block_scale (6-bit)                     | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684)                                                                             |
+| Q4_K     | GGML_FTYPE_MOSTLY_Q4_K          | GGML_TYPE_Q4_K          | 4.5         | k-quantization                | Superblocks has  8 blocks ( 32 weights per block)                                     | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684)                                                                             |
+| Q5_K     | GGML_FTYPE_MOSTLY_Q5_K          | GGML_TYPE_Q5_K          | 5.5         | k-quantization                | Superblocks has  8 blocks ( 32 weights per block)                                     | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684)                                                                             |
+| Q6_K     | GGML_FTYPE_MOSTLY_Q6_K          | GGML_TYPE_Q6_K          | 6.5625      | k-quantization                | Superblocks has 16 blocks ( 16 weights per block)                                     | w = q * block_scale (8-bit)                     | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684)                                                                             |
+| Q8_K     | -                               | GGML_TYPE_Q8_K          | 8.0         | k-quantization                | Superblocks has  1 blocks (256 weights per block) (Only used for intermediate quants) | w = q * block_scale (8-bit)                     | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684)                                                                             |
+| IQ1_S    | GGML_FTYPE_MOSTLY_IQ1_S         | GGML_TYPE_IQ1_S         | 1.5         | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: 1.5 bit quantization #5453](https://github.com/ggerganov/llama.cpp/pull/5453)                                                                 |
+| IQ1_M    | GGML_FTYPE_MOSTLY_IQ1_M         | GGML_TYPE_IQ1_M         | 1.75        | i-quantization                | Superblocks has 16 blocks ( 16 weights per block)                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ1_M: 1.75 bpw quantization #6302](https://github.com/ggerganov/llama.cpp/pull/6302)                                                         |
+| IQ2_XXS  | GGML_FTYPE_MOSTLY_IQ2_XXS       | GGML_TYPE_IQ2_XXS       | 2.0625      | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: SOTA 2-bit quants #4773](https://github.com/ggerganov/llama.cpp/pull/4773)                                                                    |
+| IQ2_XS   | GGML_FTYPE_MOSTLY_IQ2_XS        | GGML_TYPE_IQ2_XS        | 2.31        | i-quantization                | Superblocks has 16 blocks ( 16 weights per block)                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: SOTA 2-bit quants - part 2 #4856](https://github.com/ggerganov/llama.cpp/pull/4856)                                                           |
+| IQ2_S    | GGML_FTYPE_MOSTLY_IQ2_S         | GGML_TYPE_IQ2_S         | 2.5         | i-quantization                | ?                                                                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range #5721](https://github.com/ggerganov/llama.cpp/pull/5721)        |
+| IQ3_S    | GGML_FTYPE_MOSTLY_IQ3_S         | GGML_TYPE_IQ3_S         | 3.4375      | i-quantization                | ?                                                                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ3_S: a much better alternative to Q3_K #5676](https://github.com/ggerganov/llama.cpp/pull/5676)                                             |
+| IQ3_XXS  | GGML_FTYPE_MOSTLY_IQ3_XXS       | GGML_TYPE_IQ3_XXS       | 3.0625      | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: SOTA 3-bit quants #5196](https://github.com/ggerganov/llama.cpp/pull/5196)                                                                    |
+| IQ4_NL   | GGML_FTYPE_MOSTLY_IQ4_NL        | GGML_TYPE_IQ4_NL        | 4.5         | i-quantization                | Superblocks has 16 blocks ( 16 weights per block)                                     | w = [non linear mapping of quants to weights]   | [llama.cpp PR: IQ4_NL: 4-bit non-linear quants with blocks of 32 #5590](https://github.com/ggerganov/llama.cpp/pull/5590)                                    |
+| IQ4_XS   | GGML_FTYPE_MOSTLY_IQ4_XS        | GGML_TYPE_IQ4_XS        | 4.25        | i-quantization                | Superblocks has  8 blocks ( 32 weights per block)                                     | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ4_XS: a 4.25 bpw quantization #5747](https://github.com/ggerganov/llama.cpp/pull/5747)                                                      |
 
 * All superblocks have fp16 scaling factor and contains up to 256 weights. Number of weights in a block must be divisible by 256. (To be confirmed)
 
diff --git a/_Sidebar.md b/_Sidebar.md
index 4ec9ead..554277c 100644
--- a/_Sidebar.md
+++ b/_Sidebar.md
@@ -19,9 +19,10 @@ Useful information for users that doesn't fit into Readme.
 
 These are information useful for Maintainers and Developers which does not fit into code comments
 
-* [[Tensor-Encoding-Schemes]]
+* [[Tensor Encoding Schemes]]
 * [[Terminology]]
 * [[PR And Issue Tickets Maintenance]]
+* [[Dev Notes]]
 
 # Github Actions Main Branch Status
 
diff --git a/dev-notes.md b/dev-notes.md
new file mode 100644
index 0000000..595fde1
--- /dev/null
+++ b/dev-notes.md
@@ -0,0 +1,47 @@
+# Dev Note
+
+These are general free form note with pointers to good jumping to point to under
+stand the llama.cpp codebase.
+
+(`@<symbol>` is a vscode jump to symbol code for your convenience. [Also making a feature request to vscode to be able to jump to file and symbol](https://github.com/microsoft/vscode/issues/214870))
+
+
+## Where are the definitions?
+
+[GGUF file structure spec (WARN: As of 2024-06-11 the llama.cpp implementation is the canonical source for now)](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#file-structure)
+
+All of the gguf structure can be found in `gguf.c` unless stated otherwise
+
+| GGUF Structure Of Interest | gguf.c reference          | vscode search line  |
+|----------------------------|---------------------------|---------------------|
+| Overall File Structure     | `struct gguf_context`     | `@gguf_context`     |
+| File Header Structure      | `struct gguf_header`      | `@gguf_header`      |
+| Key Value Structure        | `struct gguf_kv`          | `@gguf_kv`          |
+| Tensor Info Structure      | `struct gguf_tensor_info` | `@gguf_tensor_info` |
+
+
+### Element of Interest (Think of this as an index lookup reference)
+
+Please use this as an index not as canonical reference.
+The purpose of this table is to allow you to quickly locate major elements of
+the gguf file standard.
+
+| GGUF Elements Of Interest                             | c name                  | c type                    | gguf.c reference          | vscode search line  |
+|-------------------------------------------------------|-------------------------|---------------------------|---------------------------|---------------------|
+| Magic                                                 | magic                   | `uint8_t[4]`              | `struct gguf_header`      | `@gguf_header`      |
+| Version                                               | version                 | `uint32_t`                | `struct gguf_header`      | `@gguf_header`      |
+| Tensor Count                                          | n_tensors               | `uint64_t`                | `struct gguf_header`      | `@gguf_header`      |
+| Key Value Count                                       | n_kv                    | `uint64_t`                | `struct gguf_header`      | `@gguf_header`      |
+| Key Value Linked List                                 | kv                      | `gguf_kv *`               | `struct gguf_context`     | `@gguf_context`     |
+| Tensor Info Linked List                               | infos                   | `gguf_tensor_info *`      | `struct gguf_context`     | `@gguf_context`     |
+| Key Value Entry - Key                                 | gguf_kv.key             | `gguf_str`                | `struct gguf_kv`          | `@gguf_kv`          |
+| Key Value Entry - Type                                | gguf_kv.type            | `gguf_type`               | `struct gguf_kv`          | `@gguf_kv`          |
+| Key Value Entry - Type                                | gguf_kv.value           | `gguf_value`              | `struct gguf_kv`          | `@gguf_kv`          |
+| Tensor Info Entry - Name                              | gguf_tensor_info.name   | `gguf_str`                | `struct gguf_tensor_info` | `@gguf_tensor_info` |
+| Tensor Info Entry - Tensor shape dimension count      | gguf_tensor_info.n_dim  | `uint32_t`                | `struct gguf_tensor_info` | `@gguf_tensor_info` |
+| Tensor Info Entry - Tensor shape sizing array         | gguf_tensor_info.ne     | `uint64_t[GGML_MAX_DIMS]` | `struct gguf_tensor_info` | `@gguf_tensor_info` |
+| Tensor Info Entry - Tensor Encoding Scheme / Strategy | gguf_tensor_info.type   | `ggml_type`               | `struct gguf_tensor_info` | `@gguf_tensor_info` |
+| Tensor Info Entry - Offset from start of 'data'       | gguf_tensor_info.offset | `uint64_t`                | `struct gguf_tensor_info` | `@gguf_tensor_info` |
+| Alignment                                             | alignment               | `size_t`                  | `struct gguf_context`     | `@gguf_context`     |
+| Offset Of 'Data' From Beginning Of File               | offset                  | `size_t`                  | `struct gguf_context`     | `@gguf_context`     |
+| Size Of 'Data' In Bytes                               | size                    | `size_t`                  | `struct gguf_context`     | `@gguf_context`     |