1# GBNF Guide
  2
  3GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `tools/cli`, `tools/completion` and `tools/server`.
  4
  5## Background
  6
  7[Backus-Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.
  8
  9## Basics
 10
 11In GBNF, we define *production rules* that specify how a *non-terminal* (rule name) can be replaced with sequences of *terminals* (characters, specifically Unicode [code points](https://en.wikipedia.org/wiki/Code_point)) and other non-terminals. The basic format of a production rule is `nonterminal ::= sequence...`.
 12
 13## Example
 14
 15Before going deeper, let's look at some of the features demonstrated in `grammars/chess.gbnf`, a small chess notation grammar:
 16```
 17# `root` specifies the pattern for the overall output
 18root ::= (
 19    # it must start with the characters "1. " followed by a sequence
 20    # of characters that match the `move` rule, followed by a space, followed
 21    # by another move, and then a newline
 22    "1. " move " " move "\n"
 23
 24    # it's followed by one or more subsequent moves, numbered with one or two digits
 25    ([1-9] [0-9]? ". " move " " move "\n")+
 26)
 27
 28# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
 29# The `[+#]?` denotes the possibility of checking or mate signs after moves
 30move ::= (pawn | nonpawn | castle) [+#]?
 31
 32pawn ::= ...
 33nonpawn ::= ...
 34castle ::= ...
 35```
 36
 37## Non-Terminals and Terminals
 38
 39Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like `move`, `castle`, or `check-mate`.
 40
 41Terminals are actual characters ([code points](https://en.wikipedia.org/wiki/Code_point)). They can be specified as a sequence like `"1"` or `"O-O"` or as ranges like `[1-9]` or `[NBKQR]`.
 42
 43## Characters and character ranges
 44
 45Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example `hiragana ::= [ใ-ใ‚Ÿ]`, or with escapes: 8-bit (`\xXX`), 16-bit (`\uXXXX`) or 32-bit (`\UXXXXXXXX`).
 46
 47Character ranges can be negated with `^`:
 48```
 49single-line ::= [^\n]+ "\n"
 50```
 51
 52## Sequences and Alternatives
 53
 54The order of symbols in a sequence matters. For example, in `"1. " move " " move "\n"`, the `"1. "` must come before the first `move`, etc.
 55
 56Alternatives, denoted by `|`, give different sequences that are acceptable. For example, in `move ::= pawn | nonpawn | castle`, `move` can be a `pawn` move, a `nonpawn` move, or a `castle`.
 57
 58Parentheses `()` can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
 59
 60## Repetition and Optional Symbols
 61
 62- `*` after a symbol or sequence means that it can be repeated zero or more times (equivalent to `{0,}`).
 63- `+` denotes that the symbol or sequence should appear one or more times (equivalent to `{1,}`).
 64- `?` makes the preceding symbol or sequence optional (equivalent to `{0,1}`).
 65- `{m}` repeats the precedent symbol or sequence exactly `m` times
 66- `{m,}` repeats the precedent symbol or sequence at least `m` times
 67- `{m,n}` repeats the precedent symbol or sequence at between `m` and `n` times (included)
 68- `{0,n}` repeats the precedent symbol or sequence at most `n` times (included)
 69
 70## Tokens
 71
 72Tokens allow grammars to match specific tokenizer tokens rather than character sequences. This is useful for constraining outputs based on special tokens (like `<think>` or `</think>`).
 73
 74Tokens can be specified in two ways:
 75
 761. **Token ID**: Use angle brackets with the token ID in square brackets: `<[token-id]>`. For example, `<[1000]>` matches the token with ID 1000.
 77
 782. **Token string**: Use angle brackets with the token text directly: `<token>`. For example, `<think>` will match the token whose text is exactly `<think>`. This only works if the string tokenizes to exactly one token in the vocabulary, otherwise the grammar will fail to parse.
 79
 80You can negate token matches using the `!` prefix: `!<[1000]>` or `!<think>` matches any token *except* the specified one.
 81
 82```
 83# Match a thinking block: <think>...</think>
 84# Using token strings (requires these to be single tokens in the vocab)
 85root ::= <think> thinking </think> .*
 86thinking ::= !</think>*
 87
 88# Equivalent grammar using explicit token IDs
 89# Assumes token 1000 = <think>, token 1001 = </think>
 90root ::= <[1000]> thinking <[1001]> .*
 91thinking ::= !<[1001]>*
 92```
 93
 94## Comments and newlines
 95
 96Comments can be specified with `#`:
 97```
 98# defines optional whitespace
 99ws ::= [ \t\n]+
100```
101
102Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker `|` will continue the current rule, even outside of parentheses.
103
104## The root rule
105
106In a full grammar, the `root` rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.
107
108```
109# a grammar for lists
110root ::= ("- " item)+
111item ::= [^\n]+ "\n"
112```
113
114## Next steps
115
116This guide provides a brief overview. Check out the GBNF files in this directory (`grammars/`) for examples of full grammars. You can try them out with:
117```
118./llama-cli -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'
119```
120
121`llama.cpp` can also convert JSON schemas to grammars either ahead of time or at each request, see below.
122
123## Troubleshooting
124
125Grammars currently have performance gotchas (see https://github.com/ggml-org/llama.cpp/issues/4218).
126
127### Efficient optional repetitions
128
129A common pattern is to allow repetitions of a pattern `x` up to N times.
130
131While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) may result in extremely slow sampling. Instead, you can write `x{0,N}` (or `(x (x (x ... (x)?...)?)?)?` w/ N-deep nesting in earlier llama.cpp versions).
132
133## Using GBNF grammars
134
135You can use GBNF grammars:
136
137- In [llama-server](../tools/server)'s completion endpoints, passed as the `grammar` body field
138- In [llama-cli](../tools/cli) and [llama-completion](../tools/completion), passed as the `--grammar` & `--grammar-file` flags
139- With [test-gbnf-validator](../tests/test-gbnf-validator.cpp), to test them against strings.
140
141## JSON Schemas โ†’ GBNF
142
143`llama.cpp` supports converting a subset of https://json-schema.org/ to GBNF grammars:
144
145- In [llama-server](../tools/server):
146    - For any completion endpoints, passed as the `json_schema` body field
147    - For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}` or `{ type: "json_schema", json_schema: {"schema": ...} }`)
148- In [llama-cli](../tools/cli) and [llama-completion](../tools/completion), passed as the `--json` / `-j` flag
149- To convert to a grammar ahead of time:
150    - in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
151    - in JavaScript with [json-schema-to-grammar.mjs](../tools/server/public_legacy/json-schema-to-grammar.mjs) (this is used by the [server](../tools/server)'s Web UI)
152
153> [!NOTE]
154> The JSON schema is only used to constrain the model output and is not injected into the prompt. The model has no visibility into the schema, so if you want it to understand the expected structure, describe it explicitly in your prompt. This does not apply to tool calling, where schemas are injected into the prompt.
155
156Take a look at [tests](../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggml-org/llama.cpp/pull/5978, https://github.com/ggml-org/llama.cpp/pull/6659 & https://github.com/ggml-org/llama.cpp/pull/6555).
157
158```bash
159llama-cli \
160  -hfr bartowski/Phi-3-medium-128k-instruct-GGUF \
161  -hff Phi-3-medium-128k-instruct-Q8_0.gguf \
162  -j '{
163    "type": "array",
164    "items": {
165        "type": "object",
166        "properties": {
167            "name": {
168                "type": "string",
169                "minLength": 1,
170                "maxLength": 100
171            },
172            "age": {
173                "type": "integer",
174                "minimum": 0,
175                "maximum": 150
176            }
177        },
178        "required": ["name", "age"],
179        "additionalProperties": false
180    },
181    "minItems": 10,
182    "maxItems": 100
183  }' \
184  -p 'Generate a {name, age}[] JSON array with famous actors of all ages.'
185```
186
187<details>
188
189<summary>Show grammar</summary>
190
191You can convert any schema in command-line with:
192
193```bash
194examples/json_schema_to_grammar.py name-age-schema.json
195```
196
197```
198char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
199item ::= "{" space item-name-kv "," space item-age-kv "}" space
200item-age ::= ([0-9] | ([1-8] [0-9] | [9] [0-9]) | "1" ([0-4] [0-9] | [5] "0")) space
201item-age-kv ::= "\"age\"" space ":" space item-age
202item-name ::= "\"" char{1,100} "\"" space
203item-name-kv ::= "\"name\"" space ":" space item-name
204root ::= "[" space item ("," space item){9,99} "]" space
205space ::= | " " | "\n" [ \t]{0,20}
206```
207
208</details>
209
210Here is also a list of known limitations (contributions welcome):
211
212- `additionalProperties` defaults to `false` (produces faster grammars + reduces hallucinations).
213- `"additionalProperties": true` may produce keys that contain unescaped newlines.
214- Unsupported features are skipped silently. It is currently advised to use the command-line Python converter (see above) to see any warnings, and to inspect the resulting grammar / test it w/ [llama-gbnf-validator](../examples/gbnf-validator/gbnf-validator.cpp).
215- Can't mix `properties` w/ `anyOf` / `oneOf` in the same type (https://github.com/ggml-org/llama.cpp/issues/7703)
216- [prefixItems](https://json-schema.org/draft/2020-12/json-schema-core#name-prefixitems) is broken (but [items](https://json-schema.org/draft/2020-12/json-schema-core#name-items) works)
217- `minimum`, `exclusiveMinimum`, `maximum`, `exclusiveMaximum`: only supported for `"type": "integer"` for now, not `number`
218- Nested `$ref`s are broken (https://github.com/ggml-org/llama.cpp/issues/8073)
219- [pattern](https://json-schema.org/draft/2020-12/json-schema-validation#name-pattern)s must start with `^` and end with `$`
220- Remote `$ref`s not supported in the C++ version (Python & JavaScript versions fetch https refs)
221- `string` [formats](https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats) lack `uri`, `email`
222- No [`patternProperties`](https://json-schema.org/draft/2020-12/json-schema-core#name-patternproperties)
223
224And a non-exhaustive list of other unsupported features that are unlikely to be implemented (hard and/or too slow to support w/ stateless grammars):
225
226- [`uniqueItems`](https://json-schema.org/draft/2020-12/json-schema-validation#name-uniqueitems)
227- [`contains`](https://json-schema.org/draft/2020-12/json-schema-core#name-contains) / `minContains`
228- `$anchor` (cf. [dereferencing](https://json-schema.org/draft/2020-12/json-schema-core#name-dereferencing))
229- [`not`](https://json-schema.org/draft/2020-12/json-schema-core#name-not)
230- [Conditionals](https://json-schema.org/draft/2020-12/json-schema-core#name-keywords-for-applying-subsche) `if` / `then` / `else` / `dependentSchemas`
231
232### A word about additionalProperties
233
234> [!WARNING]
235> The JSON schemas spec states `object`s accept [additional properties](https://json-schema.org/understanding-json-schema/reference/object#additionalproperties) by default.
236> Since this is slow and seems prone to hallucinations, we default to no additional properties.
237> You can set `"additionalProperties": true` in the the schema of any object to explicitly allow additional properties.
238
239If you're using [Pydantic](https://pydantic.dev/) to generate schemas, you can enable additional properties with the `extra` config on each model class:
240
241```python
242# pip install pydantic
243import json
244from typing import Annotated, List
245from pydantic import BaseModel, Extra, Field
246class QAPair(BaseModel):
247    class Config:
248        extra = 'allow'  # triggers additionalProperties: true in the JSON schema
249    question: str
250    concise_answer: str
251    justification: str
252
253class Summary(BaseModel):
254    class Config:
255        extra = 'allow'
256    key_facts: List[Annotated[str, Field(pattern='- .{5,}')]]
257    question_answers: List[Annotated[List[QAPair], Field(min_items=5)]]
258
259print(json.dumps(Summary.model_json_schema(), indent=2))
260```
261
262<details>
263<summary>Show JSON schema & grammar</summary>
264
265```json
266{
267  "$defs": {
268    "QAPair": {
269      "additionalProperties": true,
270      "properties": {
271        "question": {
272          "title": "Question",
273          "type": "string"
274        },
275        "concise_answer": {
276          "title": "Concise Answer",
277          "type": "string"
278        },
279        "justification": {
280          "title": "Justification",
281          "type": "string"
282        }
283      },
284      "required": [
285        "question",
286        "concise_answer",
287        "justification"
288      ],
289      "title": "QAPair",
290      "type": "object"
291    }
292  },
293  "additionalProperties": true,
294  "properties": {
295    "key_facts": {
296      "items": {
297        "pattern": "^- .{5,}$",
298        "type": "string"
299      },
300      "title": "Key Facts",
301      "type": "array"
302    },
303    "question_answers": {
304      "items": {
305        "items": {
306          "$ref": "#/$defs/QAPair"
307        },
308        "minItems": 5,
309        "type": "array"
310      },
311      "title": "Question Answers",
312      "type": "array"
313    }
314  },
315  "required": [
316    "key_facts",
317    "question_answers"
318  ],
319  "title": "Summary",
320  "type": "object"
321}
322```
323
324```
325QAPair ::= "{" space QAPair-question-kv "," space QAPair-concise-answer-kv "," space QAPair-justification-kv ( "," space ( QAPair-additional-kv ( "," space QAPair-additional-kv )* ) )? "}" space
326QAPair-additional-k ::= ["] ( [c] ([o] ([n] ([c] ([i] ([s] ([e] ([_] ([a] ([n] ([s] ([w] ([e] ([r] char+ | [^"r] char*) | [^"e] char*) | [^"w] char*) | [^"s] char*) | [^"n] char*) | [^"a] char*) | [^"_] char*) | [^"e] char*) | [^"s] char*) | [^"i] char*) | [^"c] char*) | [^"n] char*) | [^"o] char*) | [j] ([u] ([s] ([t] ([i] ([f] ([i] ([c] ([a] ([t] ([i] ([o] ([n] char+ | [^"n] char*) | [^"o] char*) | [^"i] char*) | [^"t] char*) | [^"a] char*) | [^"c] char*) | [^"i] char*) | [^"f] char*) | [^"i] char*) | [^"t] char*) | [^"s] char*) | [^"u] char*) | [q] ([u] ([e] ([s] ([t] ([i] ([o] ([n] char+ | [^"n] char*) | [^"o] char*) | [^"i] char*) | [^"t] char*) | [^"s] char*) | [^"e] char*) | [^"u] char*) | [^"cjq] char* )? ["] space
327QAPair-additional-kv ::= QAPair-additional-k ":" space value
328QAPair-concise-answer-kv ::= "\"concise_answer\"" space ":" space string
329QAPair-justification-kv ::= "\"justification\"" space ":" space string
330QAPair-question-kv ::= "\"question\"" space ":" space string
331additional-k ::= ["] ( [k] ([e] ([y] ([_] ([f] ([a] ([c] ([t] ([s] char+ | [^"s] char*) | [^"t] char*) | [^"c] char*) | [^"a] char*) | [^"f] char*) | [^"_] char*) | [^"y] char*) | [^"e] char*) | [q] ([u] ([e] ([s] ([t] ([i] ([o] ([n] ([_] ([a] ([n] ([s] ([w] ([e] ([r] ([s] char+ | [^"s] char*) | [^"r] char*) | [^"e] char*) | [^"w] char*) | [^"s] char*) | [^"n] char*) | [^"a] char*) | [^"_] char*) | [^"n] char*) | [^"o] char*) | [^"i] char*) | [^"t] char*) | [^"s] char*) | [^"e] char*) | [^"u] char*) | [^"kq] char* )? ["] space
332additional-kv ::= additional-k ":" space value
333array ::= "[" space ( value ("," space value)* )? "]" space
334boolean ::= ("true" | "false") space
335char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
336decimal-part ::= [0-9]{1,16}
337dot ::= [^\x0A\x0D]
338integral-part ::= [0] | [1-9] [0-9]{0,15}
339key-facts ::= "[" space (key-facts-item ("," space key-facts-item)*)? "]" space
340key-facts-item ::= "\"" "- " key-facts-item-1{5,} "\"" space
341key-facts-item-1 ::= dot
342key-facts-kv ::= "\"key_facts\"" space ":" space key-facts
343null ::= "null" space
344number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
345object ::= "{" space ( string ":" space value ("," space string ":" space value)* )? "}" space
346question-answers ::= "[" space (question-answers-item ("," space question-answers-item)*)? "]" space
347question-answers-item ::= "[" space question-answers-item-item ("," space question-answers-item-item){4,} "]" space
348question-answers-item-item ::= QAPair
349question-answers-kv ::= "\"question_answers\"" space ":" space question-answers
350root ::= "{" space key-facts-kv "," space question-answers-kv ( "," space ( additional-kv ( "," space additional-kv )* ) )? "}" space
351space ::= | " " | "\n" [ \t]{0,20}
352string ::= "\"" char* "\"" space
353value ::= object | array | string | number | boolean | null
354```
355
356</details>
357
358If you're using [Zod](https://zod.dev/), you can make your objects to explicitly allow extra properties w/ `nonstrict()` / `passthrough()` (or explicitly no extra props w/ `z.object(...).strict()` or `z.strictObject(...)`) but note that [zod-to-json-schema](https://github.com/StefanTerdell/zod-to-json-schema) currently always sets `"additionalProperties": false` anyway.
359
360```js
361import { z } from 'zod';
362import { zodToJsonSchema } from 'zod-to-json-schema';
363
364const Foo = z.object({
365  age: z.number().positive(),
366  email: z.string().email(),
367}).strict();
368
369console.log(zodToJsonSchema(Foo));
370```
371
372<details>
373<summary>Show JSON schema & grammar</summary>
374
375```json
376{
377  "type": "object",
378  "properties": {
379    "age": {
380      "type": "number",
381      "exclusiveMinimum": 0
382    },
383    "email": {
384      "type": "string",
385      "format": "email"
386    }
387  },
388  "required": [
389    "age",
390    "email"
391  ],
392  "additionalProperties": false,
393  "$schema": "http://json-schema.org/draft-07/schema#"
394}
395```
396
397```
398age-kv ::= "\"age\"" space ":" space number
399char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
400decimal-part ::= [0-9]{1,16}
401email-kv ::= "\"email\"" space ":" space string
402integral-part ::= [0] | [1-9] [0-9]{0,15}
403number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
404root ::= "{" space age-kv "," space email-kv "}" space
405space ::= | " " | "\n" [ \t]{0,20}
406string ::= "\"" char* "\"" space
407```
408
409</details>