From 57985f88ca7e31a1367af4220b544a123eba1272 Mon Sep 17 00:00:00 2001 From: brian khuu Date: Tue, 11 Jun 2024 23:04:09 +1000 Subject: [PATCH] add dev notes --- Tensor-Encoding-Schemes.md | 64 +++++++++++++++++++------------------- _Sidebar.md | 3 +- dev-notes.md | 47 ++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+), 33 deletions(-) create mode 100644 dev-notes.md diff --git a/Tensor-Encoding-Schemes.md b/Tensor-Encoding-Schemes.md index e353927..a3ca17a 100644 --- a/Tensor-Encoding-Schemes.md +++ b/Tensor-Encoding-Schemes.md @@ -27,38 +27,38 @@ This is not definitive, but is helpful when reading sourcecode or console output ## Tensor Encoding Scheme Mapping -| Scheme | `ggml_ftype` C enumeration name | `ggml_type` C enum name | Bits/Weight | Data Type | Block Configuration | Quantized Weight Formula | Initial Commits Or Pull Request Sources (of `ggml_type`) | -| -------- | ------------------------------- | ----------------------- | ----------- | ----------------------------- | ---------------------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------------------------ | -| BF16 | GGML_FTYPE_MOSTLY_BF16 | GGML_TYPE_BF16 | 16 | bfloat16 (trunc 32b IEEE754) | Homogonous Array Of Floating Weights | - | [llama.cpp PR: Introduce bfloat16 support #6412](https://github.com/ggerganov/llama.cpp/pull/6412) | -| F16 | GGML_FTYPE_MOSTLY_F16 | GGML_TYPE_F16 | 16 | 16-bit IEEE 754 | Homogonous Array Of Floating Weights | - | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | -| F32 | GGML_FTYPE_ALL_F32 | GGML_TYPE_F32 | 32 | 32-bit IEEE 754 | Homogonous Array Of Floating Weights | - | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | -| F64 | - | GGML_TYPE_F64 | 64 | 64-bit IEEE 754 | Homogonous Array Of Floating Weights | - | [llama.cpp CM: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062) | -| I8 | - | GGML_TYPE_I8 | 8 | (signed?) integer | - | - | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) | -| I16 | - | GGML_TYPE_I16 | 16 | (signed?) integer | - | - | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) | -| I32 | - | GGML_TYPE_I32 | 32 | (signed?) integer | - | - | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) | -| I64 | - | GGML_TYPE_I64 | 64 | (signed?) integer | - | - | [llama.cpp PR: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062) | -| Q4_0 | GGML_FTYPE_MOSTLY_Q4_0 | GGML_TYPE_Q4_0 | 4 | round to nearest quantization | Each block has 32 weights | w = q * block_scale | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | -| Q4_1 | GGML_FTYPE_MOSTLY_Q4_1 | GGML_TYPE_Q4_1 | 4 | round to nearest quantization | Each block has 32 weights | w = q * block_scale + block_minimum | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | -| Q4_1_F16 | GGML_FTYPE_MOSTLY_Q4_1_SOME_F16 | - | 4 | round to nearest quantization | Each block has 32 weights (token embedding and output weights are F16) | w = q * block_scale + block_minimum | [llama.cpp CM: add Q5 WASM SIMD + GGML_FTYPE](https://github.com/ggerganov/llama.cpp/commit/6bc4400e67e6bc4faad3ad3d5e9d8a6576a9752d) | -| Q8_0 | GGML_FTYPE_MOSTLY_Q8_0 | GGML_TYPE_Q8_0 | 8 | round to nearest quantization | Each block has 32 weights | w = q * block_scale | [llama.cpp PR: Add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) #1179](https://github.com/ggerganov/llama.cpp/pull/1179) | -| Q8_1 | - | GGML_TYPE_Q8_1 | 8 | round to nearest quantization | Each block has 32 weights | w = q * block_scale + block_minimum | [llama.cpp PR: Add Q8_0 quantization for intermediate results #951 (Note: Renamed to Q8_1 in later commit)](https://github.com/ggerganov/llama.cpp/pull/951) | -| Q5_0 | GGML_FTYPE_MOSTLY_Q5_0 | GGML_TYPE_Q5_0 | 5 | round to nearest quantization | Each block has 32 weights | w = q * block_scale | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187) | -| Q5_1 | GGML_FTYPE_MOSTLY_Q5_1 | GGML_TYPE_Q5_1 | 5 | round to nearest quantization | Each block has 32 weights | w = q * block_scale + block_minimum | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187) | -| Q2_K | GGML_FTYPE_MOSTLY_Q2_K | GGML_TYPE_Q2_K | 2.5625 | k-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = q * block_scale (4-bit) + block_min (4-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | -| Q3_K | GGML_FTYPE_MOSTLY_Q3_K | GGML_TYPE_Q3_K | 3.4375 | k-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = q * block_scale (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | -| Q4_K | GGML_FTYPE_MOSTLY_Q4_K | GGML_TYPE_Q4_K | 4.5 | k-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | -| Q5_K | GGML_FTYPE_MOSTLY_Q5_K | GGML_TYPE_Q5_K | 5.5 | k-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | -| Q6_K | GGML_FTYPE_MOSTLY_Q6_K | GGML_TYPE_Q6_K | 6.5625 | k-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = q * block_scale (8-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | -| Q8_K | - | GGML_TYPE_Q8_K | 8.0 | k-quantization | Superblocks has 1 blocks (256 weights per block) (Only used for intermediate quants) | w = q * block_scale (8-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | -| IQ1_S | GGML_FTYPE_MOSTLY_IQ1_S | GGML_TYPE_IQ1_S | 1.5 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: 1.5 bit quantization #5453](https://github.com/ggerganov/llama.cpp/pull/5453) | -| IQ1_M | GGML_FTYPE_MOSTLY_IQ1_M | GGML_TYPE_IQ1_M | 1.75 | i-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: IQ1_M: 1.75 bpw quantization #6302](https://github.com/ggerganov/llama.cpp/pull/6302) | -| IQ2_XXS | GGML_FTYPE_MOSTLY_IQ2_XXS | GGML_TYPE_IQ2_XXS | 2.0625 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: SOTA 2-bit quants #4773](https://github.com/ggerganov/llama.cpp/pull/4773) | -| IQ2_XS | GGML_FTYPE_MOSTLY_IQ2_XS | GGML_TYPE_IQ2_XS | 2.31 | i-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: SOTA 2-bit quants - part 2 #4856](https://github.com/ggerganov/llama.cpp/pull/4856) | -| IQ2_S | GGML_FTYPE_MOSTLY_IQ2_S | GGML_TYPE_IQ2_S | 2.5 | i-quantization | ? | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range #5721](https://github.com/ggerganov/llama.cpp/pull/5721) | -| IQ3_S | GGML_FTYPE_MOSTLY_IQ3_S | GGML_TYPE_IQ3_S | 3.4375 | i-quantization | ? | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: IQ3_S: a much better alternative to Q3_K #5676](https://github.com/ggerganov/llama.cpp/pull/5676) | -| IQ3_XXS | GGML_FTYPE_MOSTLY_IQ3_XXS | GGML_TYPE_IQ3_XXS | 3.0625 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: SOTA 3-bit quants #5196](https://github.com/ggerganov/llama.cpp/pull/5196) | -| IQ4_NL | GGML_FTYPE_MOSTLY_IQ4_NL | GGML_TYPE_IQ4_NL | 4.5 | i-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = [non linear mapping of quants to weights] | [llama.cpp PR: IQ4_NL: 4-bit non-linear quants with blocks of 32 #5590](https://github.com/ggerganov/llama.cpp/pull/5590) | -| IQ4_XS | GGML_FTYPE_MOSTLY_IQ4_XS | GGML_TYPE_IQ4_XS | 4.25 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: IQ4_XS: a 4.25 bpw quantization #5747](https://github.com/ggerganov/llama.cpp/pull/5747) | +| Scheme | `ggml_ftype` C enumeration name | `ggml_type` C enum name | Bits/Weight | Data Type | Block Configuration | Quantized Weight Formula | Initial Commits Or Pull Request Sources (of `ggml_type`) | +|----------|---------------------------------|-------------------------|-------------|-------------------------------|---------------------------------------------------------------------------------------|-------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------| +| BF16 | GGML_FTYPE_MOSTLY_BF16 | GGML_TYPE_BF16 | 16 | bfloat16 (trunc 32b IEEE754) | Homogonous Array Of Floating Weights | - | [llama.cpp PR: Introduce bfloat16 support #6412](https://github.com/ggerganov/llama.cpp/pull/6412) | +| F16 | GGML_FTYPE_MOSTLY_F16 | GGML_TYPE_F16 | 16 | 16-bit IEEE 754 | Homogonous Array Of Floating Weights | - | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | +| F32 | GGML_FTYPE_ALL_F32 | GGML_TYPE_F32 | 32 | 32-bit IEEE 754 | Homogonous Array Of Floating Weights | - | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | +| F64 | - | GGML_TYPE_F64 | 64 | 64-bit IEEE 754 | Homogonous Array Of Floating Weights | - | [llama.cpp CM: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062) | +| I8 | - | GGML_TYPE_I8 | 8 | (signed?) integer | - | - | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) | +| I16 | - | GGML_TYPE_I16 | 16 | (signed?) integer | - | - | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) | +| I32 | - | GGML_TYPE_I32 | 32 | (signed?) integer | - | - | [llama.cpp PR: Designate enum vals for integer types #6050](https://github.com/ggerganov/llama.cpp/pull/6050) | +| I64 | - | GGML_TYPE_I64 | 64 | (signed?) integer | - | - | [llama.cpp PR: Add support for I64 and F64 arrays #6062](https://github.com/ggerganov/llama.cpp/pull/6062) | +| Q4_0 | GGML_FTYPE_MOSTLY_Q4_0 | GGML_TYPE_Q4_0 | 4 | round to nearest quantization | Each block has 32 weights | w = q * block_scale | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | +| Q4_1 | GGML_FTYPE_MOSTLY_Q4_1 | GGML_TYPE_Q4_1 | 4 | round to nearest quantization | Each block has 32 weights | w = q * block_scale + block_minimum | [llama.cpp CM: Initial Release](https://github.com/ggerganov/llama.cpp/commit/26c084662903ddaca19bef982831bfb0856e8257) | +| Q4_1_F16 | GGML_FTYPE_MOSTLY_Q4_1_SOME_F16 | - | 4 | round to nearest quantization | Each block has 32 weights (token embedding and output weights are F16) | w = q * block_scale + block_minimum | [llama.cpp CM: add Q5 WASM SIMD + GGML_FTYPE](https://github.com/ggerganov/llama.cpp/commit/6bc4400e67e6bc4faad3ad3d5e9d8a6576a9752d) | +| Q8_0 | GGML_FTYPE_MOSTLY_Q8_0 | GGML_TYPE_Q8_0 | 8 | round to nearest quantization | Each block has 32 weights | w = q * block_scale | [llama.cpp PR: Add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) #1179](https://github.com/ggerganov/llama.cpp/pull/1179) | +| Q8_1 | - | GGML_TYPE_Q8_1 | 8 | round to nearest quantization | Each block has 32 weights | w = q * block_scale + block_minimum | [llama.cpp PR: Add Q8_0 quantization for intermediate results #951 (Note: Renamed to Q8_1 in later commit)](https://github.com/ggerganov/llama.cpp/pull/951) | +| Q5_0 | GGML_FTYPE_MOSTLY_Q5_0 | GGML_TYPE_Q5_0 | 5 | round to nearest quantization | Each block has 32 weights | w = q * block_scale | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187) | +| Q5_1 | GGML_FTYPE_MOSTLY_Q5_1 | GGML_TYPE_Q5_1 | 5 | round to nearest quantization | Each block has 32 weights | w = q * block_scale + block_minimum | [llama.cpp PR: Add Q5_0 and Q5_1 quantization #1187](https://github.com/ggerganov/llama.cpp/pull/1187) | +| Q2_K | GGML_FTYPE_MOSTLY_Q2_K | GGML_TYPE_Q2_K | 2.5625 | k-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = q * block_scale (4-bit) + block_min (4-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | +| Q3_K | GGML_FTYPE_MOSTLY_Q3_K | GGML_TYPE_Q3_K | 3.4375 | k-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = q * block_scale (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | +| Q4_K | GGML_FTYPE_MOSTLY_Q4_K | GGML_TYPE_Q4_K | 4.5 | k-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | +| Q5_K | GGML_FTYPE_MOSTLY_Q5_K | GGML_TYPE_Q5_K | 5.5 | k-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = q * block_scale (6-bit) + block_min (6-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | +| Q6_K | GGML_FTYPE_MOSTLY_Q6_K | GGML_TYPE_Q6_K | 6.5625 | k-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = q * block_scale (8-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | +| Q8_K | - | GGML_TYPE_Q8_K | 8.0 | k-quantization | Superblocks has 1 blocks (256 weights per block) (Only used for intermediate quants) | w = q * block_scale (8-bit) | [llama.cpp PR: k-quants #1684](https://github.com/ggerganov/llama.cpp/pull/1684) | +| IQ1_S | GGML_FTYPE_MOSTLY_IQ1_S | GGML_TYPE_IQ1_S | 1.5 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: 1.5 bit quantization #5453](https://github.com/ggerganov/llama.cpp/pull/5453) | +| IQ1_M | GGML_FTYPE_MOSTLY_IQ1_M | GGML_TYPE_IQ1_M | 1.75 | i-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: IQ1_M: 1.75 bpw quantization #6302](https://github.com/ggerganov/llama.cpp/pull/6302) | +| IQ2_XXS | GGML_FTYPE_MOSTLY_IQ2_XXS | GGML_TYPE_IQ2_XXS | 2.0625 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: SOTA 2-bit quants #4773](https://github.com/ggerganov/llama.cpp/pull/4773) | +| IQ2_XS | GGML_FTYPE_MOSTLY_IQ2_XS | GGML_TYPE_IQ2_XS | 2.31 | i-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: SOTA 2-bit quants - part 2 #4856](https://github.com/ggerganov/llama.cpp/pull/4856) | +| IQ2_S | GGML_FTYPE_MOSTLY_IQ2_S | GGML_TYPE_IQ2_S | 2.5 | i-quantization | ? | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range #5721](https://github.com/ggerganov/llama.cpp/pull/5721) | +| IQ3_S | GGML_FTYPE_MOSTLY_IQ3_S | GGML_TYPE_IQ3_S | 3.4375 | i-quantization | ? | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: IQ3_S: a much better alternative to Q3_K #5676](https://github.com/ggerganov/llama.cpp/pull/5676) | +| IQ3_XXS | GGML_FTYPE_MOSTLY_IQ3_XXS | GGML_TYPE_IQ3_XXS | 3.0625 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: SOTA 3-bit quants #5196](https://github.com/ggerganov/llama.cpp/pull/5196) | +| IQ4_NL | GGML_FTYPE_MOSTLY_IQ4_NL | GGML_TYPE_IQ4_NL | 4.5 | i-quantization | Superblocks has 16 blocks ( 16 weights per block) | w = [non linear mapping of quants to weights] | [llama.cpp PR: IQ4_NL: 4-bit non-linear quants with blocks of 32 #5590](https://github.com/ggerganov/llama.cpp/pull/5590) | +| IQ4_XS | GGML_FTYPE_MOSTLY_IQ4_XS | GGML_TYPE_IQ4_XS | 4.25 | i-quantization | Superblocks has 8 blocks ( 32 weights per block) | w = func(superblock_scale, importance_matrix) | [llama.cpp PR: IQ4_XS: a 4.25 bpw quantization #5747](https://github.com/ggerganov/llama.cpp/pull/5747) | * All superblocks have fp16 scaling factor and contains up to 256 weights. Number of weights in a block must be divisible by 256. (To be confirmed) diff --git a/_Sidebar.md b/_Sidebar.md index 4ec9ead..554277c 100644 --- a/_Sidebar.md +++ b/_Sidebar.md @@ -19,9 +19,10 @@ Useful information for users that doesn't fit into Readme. These are information useful for Maintainers and Developers which does not fit into code comments -* [[Tensor-Encoding-Schemes]] +* [[Tensor Encoding Schemes]] * [[Terminology]] * [[PR And Issue Tickets Maintenance]] +* [[Dev Notes]] # Github Actions Main Branch Status diff --git a/dev-notes.md b/dev-notes.md new file mode 100644 index 0000000..595fde1 --- /dev/null +++ b/dev-notes.md @@ -0,0 +1,47 @@ +# Dev Note + +These are general free form note with pointers to good jumping to point to under +stand the llama.cpp codebase. + +(`@` is a vscode jump to symbol code for your convenience. [Also making a feature request to vscode to be able to jump to file and symbol](https://github.com/microsoft/vscode/issues/214870)) + + +## Where are the definitions? + +[GGUF file structure spec (WARN: As of 2024-06-11 the llama.cpp implementation is the canonical source for now)](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#file-structure) + +All of the gguf structure can be found in `gguf.c` unless stated otherwise + +| GGUF Structure Of Interest | gguf.c reference | vscode search line | +|----------------------------|---------------------------|---------------------| +| Overall File Structure | `struct gguf_context` | `@gguf_context` | +| File Header Structure | `struct gguf_header` | `@gguf_header` | +| Key Value Structure | `struct gguf_kv` | `@gguf_kv` | +| Tensor Info Structure | `struct gguf_tensor_info` | `@gguf_tensor_info` | + + +### Element of Interest (Think of this as an index lookup reference) + +Please use this as an index not as canonical reference. +The purpose of this table is to allow you to quickly locate major elements of +the gguf file standard. + +| GGUF Elements Of Interest | c name | c type | gguf.c reference | vscode search line | +|-------------------------------------------------------|-------------------------|---------------------------|---------------------------|---------------------| +| Magic | magic | `uint8_t[4]` | `struct gguf_header` | `@gguf_header` | +| Version | version | `uint32_t` | `struct gguf_header` | `@gguf_header` | +| Tensor Count | n_tensors | `uint64_t` | `struct gguf_header` | `@gguf_header` | +| Key Value Count | n_kv | `uint64_t` | `struct gguf_header` | `@gguf_header` | +| Key Value Linked List | kv | `gguf_kv *` | `struct gguf_context` | `@gguf_context` | +| Tensor Info Linked List | infos | `gguf_tensor_info *` | `struct gguf_context` | `@gguf_context` | +| Key Value Entry - Key | gguf_kv.key | `gguf_str` | `struct gguf_kv` | `@gguf_kv` | +| Key Value Entry - Type | gguf_kv.type | `gguf_type` | `struct gguf_kv` | `@gguf_kv` | +| Key Value Entry - Type | gguf_kv.value | `gguf_value` | `struct gguf_kv` | `@gguf_kv` | +| Tensor Info Entry - Name | gguf_tensor_info.name | `gguf_str` | `struct gguf_tensor_info` | `@gguf_tensor_info` | +| Tensor Info Entry - Tensor shape dimension count | gguf_tensor_info.n_dim | `uint32_t` | `struct gguf_tensor_info` | `@gguf_tensor_info` | +| Tensor Info Entry - Tensor shape sizing array | gguf_tensor_info.ne | `uint64_t[GGML_MAX_DIMS]` | `struct gguf_tensor_info` | `@gguf_tensor_info` | +| Tensor Info Entry - Tensor Encoding Scheme / Strategy | gguf_tensor_info.type | `ggml_type` | `struct gguf_tensor_info` | `@gguf_tensor_info` | +| Tensor Info Entry - Offset from start of 'data' | gguf_tensor_info.offset | `uint64_t` | `struct gguf_tensor_info` | `@gguf_tensor_info` | +| Alignment | alignment | `size_t` | `struct gguf_context` | `@gguf_context` | +| Offset Of 'Data' From Beginning Of File | offset | `size_t` | `struct gguf_context` | `@gguf_context` | +| Size Of 'Data' In Bytes | size | `size_t` | `struct gguf_context` | `@gguf_context` |