llmnpc - llama.cpp/docs/preset.md

Path: llmnpc / llama.cpp / docs / preset.md (raw)
 1# llama.cpp INI Presets
 2
 3## Introduction
 4
 5The INI preset feature, introduced in [PR#17859](https://github.com/ggml-org/llama.cpp/pull/17859), allows users to create reusable and shareable parameter configurations for llama.cpp.
 6
 7### Using Presets with the Server
 8
 9When running multiple models on the server (router mode), INI preset files can be used to configure model-specific parameters. Please refer to the [server documentation](../tools/server/README.md) for more details.
10
11### Using a Remote Preset
12
13> [!NOTE]
14>
15> This feature is currently only supported via the `-hf` option.
16
17For GGUF models hosted on Hugging Face, you can include a `preset.ini` file in the root directory of the repository to define specific configurations for that model.
18
19Example:
20
21```ini
22hf-repo-draft = username/my-draft-model-GGUF
23temp = 0.5
24top-k = 20
25top-p = 0.95
26```
27
28For security reasons, only certain options are allowed. Please refer to [preset.cpp](../common/preset.cpp) for the complete list of permitted options.
29
30Example usage:
31
32Assuming your repository `username/my-model-with-preset` contains a `preset.ini` with the configuration above:
33
34```sh
35llama-cli -hf username/my-model-with-preset
36
37# This is equivalent to:
38llama-cli -hf username/my-model-with-preset \
39  --hf-repo-draft username/my-draft-model-GGUF \
40  --temp 0.5 \
41  --top-k 20 \
42  --top-p 0.95
43```
44
45You can also override preset arguments by specifying them on the command line:
46
47```sh
48# Force temp = 0.1, overriding the preset value
49llama-cli -hf username/my-model-with-preset --temp 0.1
50```
51
52If you want to define multiple preset configurations for one or more GGUF models, you can create a blank HF repo for each preset. Each HF repo should contain a `preset.ini` file that references the actual model(s):
53
54```ini
55hf-repo = user/my-model-main
56hf-repo-draft = user/my-model-draft
57temp = 0.8
58ctx-size = 1024
59; (and other configurations)
60```
61
62### Named presets
63
64If you want to define multiple preset configurations for one or more GGUF models, you can create a blank HF repo containing a single `preset.ini` file that references the actual model(s):
65
66```ini
67[*]
68mmap = 1
69
70[gpt-oss-20b-hf]
71hf          = ggml-org/gpt-oss-20b-GGUF
72batch-size  = 2048
73ubatch-size = 2048
74top-p       = 1.0
75top-k       = 0
76min-p       = 0.01
77temp        = 1.0
78chat-template-kwargs = {"reasoning_effort": "high"}
79
80[gpt-oss-120b-hf]
81hf          = ggml-org/gpt-oss-120b-GGUF
82batch-size  = 2048
83ubatch-size = 2048
84top-p       = 1.0
85top-k       = 0
86min-p       = 0.01
87temp        = 1.0
88chat-template-kwargs = {"reasoning_effort": "high"}
89```
90
91You can then use it via `llama-cli` or `llama-server`, example:
92
93```sh
94llama-server -hf user/repo:gpt-oss-120b-hf
95```
96
97Please make sure to provide the correct `hf-repo` for each child preset. Otherwise, you may get error: `The specified tag is not a valid quantization scheme.`