llama.cpp/grammars/README.md

# GBNF Guide

GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `examples/main` and `examples/server`.

## Background

[Bakus-Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.

## Basics

In GBNF, we define *production rules* that specify how a *non-terminal* (rule name) can be replaced with sequences of *terminals* (characters, specifically Unicode [code points](https://en.wikipedia.org/wiki/Code_point)) and other non-terminals. The basic format of a production rule is `nonterminal ::= sequence...`.

## Example

Before going deeper, let's look at some of the features demonstrated in `grammars/chess.gbnf`, a small chess notation grammar:
```
# `root` specifies the pattern for the overall output
root ::= (
    # it must start with the characters "1. " followed by a sequence
    # of characters that match the `move` rule, followed by a space, followed
    # by another move, and then a newline
    "1. " move " " move "\n"

    # it's followed by one or more subsequent moves, numbered with one or two digits
    ([1-9] [0-9]? ". " move " " move "\n")+
)

# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?

pawn ::= ...
nonpawn ::= ...
castle ::= ...
```

## Non-Terminals and Terminals

Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like `move`, `castle`, or `check-mate`.

Terminals are actual characters ([code points](https://en.wikipedia.org/wiki/Code_point)). They can be specified as a sequence like `"1"` or `"O-O"` or as ranges like `[1-9]` or `[NBKQR]`.

## Characters and character ranges

Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example `hiragana ::= [ぁ-ゟ]`, or with escapes: 8-bit (`\xXX`), 16-bit (`\uXXXX`) or 32-bit (`\UXXXXXXXX`).

Character ranges can be negated with `^`:
```
single-line ::= [^\n]+ "\n"`
```

## Sequences and Alternatives

The order of symbols in a sequence matters. For example, in `"1. " move " " move "\n"`, the `"1. "` must come before the first `move`, etc.

Alternatives, denoted by `|`, give different sequences that are acceptable. For example, in `move ::= pawn | nonpawn | castle`, `move` can be a `pawn` move, a `nonpawn` move, or a `castle`.

Parentheses `()` can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.

## Repetition and Optional Symbols

- `*` after a symbol or sequence means that it can be repeated zero or more times (equivalent to `{0,}`).
- `+` denotes that the symbol or sequence should appear one or more times (equivalent to `{1,}`).
- `?` makes the preceding symbol or sequence optional (equivalent to `{0,1}`).
- `{m}` repeats the precedent symbol or sequence exactly `m` times
- `{m,}` repeats the precedent symbol or sequence at least `m` times
- `{m,n}` repeats the precedent symbol or sequence at between `m` and `n` times (included)
- `{0,n}` repeats the precedent symbol or sequence at most `n` times (included)

## Comments and newlines

Comments can be specified with `#`:
```
# defines optional whitespace
ws ::= [ \t\n]+
```

Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker `|` will continue the current rule, even outside of parentheses.

## The root rule

In a full grammar, the `root` rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.

```
# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"
```

## Next steps

This guide provides a brief overview. Check out the GBNF files in this directory (`grammars/`) for examples of full grammars. You can try them out with:
```
./main -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'
```

`llama.cpp` can also convert JSON schemas to grammars either ahead of time or at each request, see below.

## Troubleshooting

Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).

### Efficient optional repetitions

A common pattern is to allow repetitions of a pattern `x` up to N times.

While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) may result in extremely slow sampling. Instead, you can write `x{0,N}` (or `(x (x (x ... (x)?...)?)?)?` w/ N-deep nesting in earlier llama.cpp versions).

## Using GBNF grammars

You can use GBNF grammars:

- In the [server](../examples/server)'s completion endpoints, passed as the `grammar` body field
- In the [main](../examples/main) CLI, passed as the `--grammar` & `--grammar-file` flags
- With the [gbnf-validator](../examples/gbnf-validator) tool, to test them against strings.

## JSON Schemas → GBNF

`llama.cpp` supports converting a subset of https://json-schema.org/ to GBNF grammars:

- In the [server](../examples/server):
    - For any completion endpoints, passed as the `json_schema` body field
    - For the `/chat/completions` endpoint, passed inside the `result_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}`)
- In the [main](../examples/main) CLI, passed as the `--json` / `-j` flag
- To convert to a grammar ahead of time:
    - in CLI, with [json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
    - in JavaScript with [json-schema-to-grammar.mjs](../examples/server/public/json-schema-to-grammar.mjs) (this is used by the [server](../examples/server)'s Web UI)

Take a look at [tests](../../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).

Here is also a non-exhaustive list of **unsupported** features:

- `additionalProperties`: to be fixed in https://github.com/ggerganov/llama.cpp/pull/7840
- `minimum`, `exclusiveMinimum`, `maximum`, `exclusiveMaximum`
    - `integer` constraints to be implemented in https://github.com/ggerganov/llama.cpp/pull/7797
- Remote `$ref`s in the C++ version (Python & JavaScript versions fetch https refs)
- Mixing `properties` w/ `anyOf` / `oneOf` in the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
- `string` formats `uri`, `email`
- [`contains`](https://json-schema.org/draft/2020-12/json-schema-core#name-contains) / `minContains`
- `uniqueItems`
- `$anchor` (cf. [dereferencing](https://json-schema.org/draft/2020-12/json-schema-core#name-dereferencing))
- [`not`](https://json-schema.org/draft/2020-12/json-schema-core#name-not)
- [Conditionals](https://json-schema.org/draft/2020-12/json-schema-core#name-keywords-for-applying-subsche) `if` / `then` / `else` / `dependentSchemas`
- [`patternProperties`](https://json-schema.org/draft/2020-12/json-schema-core#name-patternproperties)
docs : add grammar docs (#2701) * docs : add grammar docs * tweaks to grammar guide * rework GBNF example to be a commented grammar 2023-08-23 03:01:57 +02:00			`# GBNF Guide`

			GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `examples/main` and `examples/server`.

			`## Background`

			`[Bakus-Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.`

			`## Basics`

			In GBNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode [code points](https://en.wikipedia.org/wiki/Code_point)) and other non-terminals. The basic format of a production rule is `nonterminal ::= sequence...`.

			`## Example`

			Before going deeper, let's look at some of the features demonstrated in `grammars/chess.gbnf`, a small chess notation grammar:
			```
			# `root` specifies the pattern for the overall output
			`root ::= (`
			`# it must start with the characters "1. " followed by a sequence`
			# of characters that match the `move` rule, followed by a space, followed
			`# by another move, and then a newline`
			`"1. " move " " move "\n"`

			`# it's followed by one or more subsequent moves, numbered with one or two digits`
			`([1-9] [0-9]? ". " move " " move "\n")+`
			`)`

			# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
			# The `[+#]?` denotes the possibility of checking or mate signs after moves
			`move ::= (pawn \| nonpawn \| castle) [+#]?`

			`pawn ::= ...`
			`nonpawn ::= ...`
			`castle ::= ...`
			```

			`## Non-Terminals and Terminals`

			Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like `move`, `castle`, or `check-mate`.

			Terminals are actual characters ([code points](https://en.wikipedia.org/wiki/Code_point)). They can be specified as a sequence like `"1"` or `"O-O"` or as ranges like `[1-9]` or `[NBKQR]`.

			`## Characters and character ranges`

			Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example `hiragana ::= [ぁ-ゟ]`, or with escapes: 8-bit (`\xXX`), 16-bit (`\uXXXX`) or 32-bit (`\UXXXXXXXX`).

			Character ranges can be negated with `^`:
			```
			single-line ::= [^\n]+ "\n"`
			```

			`## Sequences and Alternatives`

docs: fix typos (#7124) * fix typo * fix typos * fix typo * fix typos * fix typo * fix typos 2024-05-07 17:20:33 +02:00			The order of symbols in a sequence matters. For example, in `"1. " move " " move "\n"`, the `"1. "` must come before the first `move`, etc.
docs : add grammar docs (#2701) * docs : add grammar docs * tweaks to grammar guide * rework GBNF example to be a commented grammar 2023-08-23 03:01:57 +02:00
			Alternatives, denoted by `\|`, give different sequences that are acceptable. For example, in `move ::= pawn \| nonpawn \| castle`, `move` can be a `pawn` move, a `nonpawn` move, or a `castle`.

Fix some documentation typos/grammar mistakes (#4032) * typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> 2023-11-12 07:04:58 +01:00			Parentheses `()` can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
docs : add grammar docs (#2701) * docs : add grammar docs * tweaks to grammar guide * rework GBNF example to be a commented grammar 2023-08-23 03:01:57 +02:00
			`## Repetition and Optional Symbols`

grammars: x{min,max} repetition operator (#6640) * grammars: x{min,max} repetition operator + tweak +//? to avoid duplication of original over alternates grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <hanclinto@gmail.com> 2024-06-06 11:07:06 +02:00			- `*` after a symbol or sequence means that it can be repeated zero or more times (equivalent to `{0,}`).
			- `+` denotes that the symbol or sequence should appear one or more times (equivalent to `{1,}`).
			- `?` makes the preceding symbol or sequence optional (equivalent to `{0,1}`).
			- `{m}` repeats the precedent symbol or sequence exactly `m` times
			- `{m,}` repeats the precedent symbol or sequence at least `m` times
			- `{m,n}` repeats the precedent symbol or sequence at between `m` and `n` times (included)
			- `{0,n}` repeats the precedent symbol or sequence at most `n` times (included)
docs : add grammar docs (#2701) * docs : add grammar docs * tweaks to grammar guide * rework GBNF example to be a commented grammar 2023-08-23 03:01:57 +02:00
			`## Comments and newlines`

			Comments can be specified with `#`:
			```
Fix some documentation typos/grammar mistakes (#4032) * typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> 2023-11-12 07:04:58 +01:00			`# defines optional whitespace`
docs : add grammar docs (#2701) * docs : add grammar docs * tweaks to grammar guide * rework GBNF example to be a commented grammar 2023-08-23 03:01:57 +02:00			`ws ::= [ \t\n]+`
			```

			Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker `\|` will continue the current rule, even outside of parentheses.

			`## The root rule`

			In a full grammar, the `root` rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.

			```
			`# a grammar for lists`
			`root ::= ("- " item)+`
			`item ::= [^\n]+ "\n"`
			```

			`## Next steps`

			This guide provides a brief overview. Check out the GBNF files in this directory (`grammars/`) for examples of full grammars. You can try them out with:
			```
			`./main -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'`
			```
JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555) * json: rename python schema converter to make import easier * server: skip null json_schema / grammar fields * json: deps management for primitive rules (+ allow null values) * json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?` * grammars: add troubleshooting section to readme * json: cap length of numbers to 15 digits before/after decimal point (avoids infinite gen, e.g. "one third" -> `0.333333333333...`) * json: unify all repetition code (w/ or w/o sep) * json: support string minLength/maxLength * server+json: update server/README w/ result_format * nits * json: fix type error w/ python 3.8 * json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions) * json: simplify DOT `{"type": "string", "pattern": "^.$"}` * json: remove recursion in opt_repetitions (avoids Python stack overflow) * json: rm dead code * json: rm useless assert & ggml.h import 2024-04-12 20:43:38 +02:00
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme 2024-06-11 02:00:30 +02:00			`llama.cpp` can also convert JSON schemas to grammars either ahead of time or at each request, see below.

JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555) * json: rename python schema converter to make import easier * server: skip null json_schema / grammar fields * json: deps management for primitive rules (+ allow null values) * json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?` * grammars: add troubleshooting section to readme * json: cap length of numbers to 15 digits before/after decimal point (avoids infinite gen, e.g. "one third" -> `0.333333333333...`) * json: unify all repetition code (w/ or w/o sep) * json: support string minLength/maxLength * server+json: update server/README w/ result_format * nits * json: fix type error w/ python 3.8 * json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions) * json: simplify DOT `{"type": "string", "pattern": "^.$"}` * json: remove recursion in opt_repetitions (avoids Python stack overflow) * json: rm dead code * json: rm useless assert & ggml.h import 2024-04-12 20:43:38 +02:00			`## Troubleshooting`

			`Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).`

			`### Efficient optional repetitions`

			A common pattern is to allow repetitions of a pattern `x` up to N times.

grammars: x{min,max} repetition operator (#6640) * grammars: x{min,max} repetition operator + tweak +//? to avoid duplication of original over alternates grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <hanclinto@gmail.com> 2024-06-06 11:07:06 +02:00			While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) may result in extremely slow sampling. Instead, you can write `x{0,N}` (or `(x (x (x ... (x)?...)?)?)?` w/ N-deep nesting in earlier llama.cpp versions).
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme 2024-06-11 02:00:30 +02:00
			`## Using GBNF grammars`

			`You can use GBNF grammars:`

			- In the [server](../examples/server)'s completion endpoints, passed as the `grammar` body field
			- In the [main](../examples/main) CLI, passed as the `--grammar` & `--grammar-file` flags
			`- With the [gbnf-validator](../examples/gbnf-validator) tool, to test them against strings.`

			`## JSON Schemas → GBNF`

			`llama.cpp` supports converting a subset of https://json-schema.org/ to GBNF grammars:

			`- In the [server](../examples/server):`
			- For any completion endpoints, passed as the `json_schema` body field
			- For the `/chat/completions` endpoint, passed inside the `result_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}`)
			- In the [main](../examples/main) CLI, passed as the `--json` / `-j` flag
			`- To convert to a grammar ahead of time:`
			`- in CLI, with [json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)`
			`- in JavaScript with [json-schema-to-grammar.mjs](../examples/server/public/json-schema-to-grammar.mjs) (this is used by the [server](../examples/server)'s Web UI)`

			`Take a look at [tests](../../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).`

			`Here is also a non-exhaustive list of unsupported features:`

			- `additionalProperties`: to be fixed in https://github.com/ggerganov/llama.cpp/pull/7840
			- `minimum`, `exclusiveMinimum`, `maximum`, `exclusiveMaximum`
			- `integer` constraints to be implemented in https://github.com/ggerganov/llama.cpp/pull/7797
			- Remote `$ref`s in the C++ version (Python & JavaScript versions fetch https refs)
			- Mixing `properties` w/ `anyOf` / `oneOf` in the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
			- `string` formats `uri`, `email`
			- [`contains`](https://json-schema.org/draft/2020-12/json-schema-core#name-contains) / `minContains`
			- `uniqueItems`
			- `$anchor` (cf. [dereferencing](https://json-schema.org/draft/2020-12/json-schema-core#name-dereferencing))
			- [`not`](https://json-schema.org/draft/2020-12/json-schema-core#name-not)
			- [Conditionals](https://json-schema.org/draft/2020-12/json-schema-core#name-keywords-for-applying-subsche) `if` / `then` / `else` / `dependentSchemas`
			- [`patternProperties`](https://json-schema.org/draft/2020-12/json-schema-core#name-patternproperties)