diff options
Diffstat (limited to 'llama.cpp/docs/preset.md')
| -rw-r--r-- | llama.cpp/docs/preset.md | 97 |
1 files changed, 97 insertions, 0 deletions
diff --git a/llama.cpp/docs/preset.md b/llama.cpp/docs/preset.md new file mode 100644 index 0000000..d49fb0a --- /dev/null +++ b/llama.cpp/docs/preset.md @@ -0,0 +1,97 @@ +# llama.cpp INI Presets + +## Introduction + +The INI preset feature, introduced in [PR#17859](https://github.com/ggml-org/llama.cpp/pull/17859), allows users to create reusable and shareable parameter configurations for llama.cpp. + +### Using Presets with the Server + +When running multiple models on the server (router mode), INI preset files can be used to configure model-specific parameters. Please refer to the [server documentation](../tools/server/README.md) for more details. + +### Using a Remote Preset + +> [!NOTE] +> +> This feature is currently only supported via the `-hf` option. + +For GGUF models hosted on Hugging Face, you can include a `preset.ini` file in the root directory of the repository to define specific configurations for that model. + +Example: + +```ini +hf-repo-draft = username/my-draft-model-GGUF +temp = 0.5 +top-k = 20 +top-p = 0.95 +``` + +For security reasons, only certain options are allowed. Please refer to [preset.cpp](../common/preset.cpp) for the complete list of permitted options. + +Example usage: + +Assuming your repository `username/my-model-with-preset` contains a `preset.ini` with the configuration above: + +```sh +llama-cli -hf username/my-model-with-preset + +# This is equivalent to: +llama-cli -hf username/my-model-with-preset \ + --hf-repo-draft username/my-draft-model-GGUF \ + --temp 0.5 \ + --top-k 20 \ + --top-p 0.95 +``` + +You can also override preset arguments by specifying them on the command line: + +```sh +# Force temp = 0.1, overriding the preset value +llama-cli -hf username/my-model-with-preset --temp 0.1 +``` + +If you want to define multiple preset configurations for one or more GGUF models, you can create a blank HF repo for each preset. Each HF repo should contain a `preset.ini` file that references the actual model(s): + +```ini +hf-repo = user/my-model-main +hf-repo-draft = user/my-model-draft +temp = 0.8 +ctx-size = 1024 +; (and other configurations) +``` + +### Named presets + +If you want to define multiple preset configurations for one or more GGUF models, you can create a blank HF repo containing a single `preset.ini` file that references the actual model(s): + +```ini +[*] +mmap = 1 + +[gpt-oss-20b-hf] +hf = ggml-org/gpt-oss-20b-GGUF +batch-size = 2048 +ubatch-size = 2048 +top-p = 1.0 +top-k = 0 +min-p = 0.01 +temp = 1.0 +chat-template-kwargs = {"reasoning_effort": "high"} + +[gpt-oss-120b-hf] +hf = ggml-org/gpt-oss-120b-GGUF +batch-size = 2048 +ubatch-size = 2048 +top-p = 1.0 +top-k = 0 +min-p = 0.01 +temp = 1.0 +chat-template-kwargs = {"reasoning_effort": "high"} +``` + +You can then use it via `llama-cli` or `llama-server`, example: + +```sh +llama-server -hf user/repo:gpt-oss-120b-hf +``` + +Please make sure to provide the correct `hf-repo` for each child preset. Otherwise, you may get error: `The specified tag is not a valid quantization scheme.` |
