summaryrefslogtreecommitdiff
path: root/llama.cpp/examples/parallel/README.md
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2026-02-12 20:57:17 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2026-02-12 20:57:17 +0100
commitb333b06772c89d96aacb5490d6a219fba7c09cc6 (patch)
tree211df60083a5946baa2ed61d33d8121b7e251b06 /llama.cpp/examples/parallel/README.md
downloadllmnpc-b333b06772c89d96aacb5490d6a219fba7c09cc6.tar.gz
Engage!
Diffstat (limited to 'llama.cpp/examples/parallel/README.md')
-rw-r--r--llama.cpp/examples/parallel/README.md14
1 files changed, 14 insertions, 0 deletions
diff --git a/llama.cpp/examples/parallel/README.md b/llama.cpp/examples/parallel/README.md
new file mode 100644
index 0000000..2468a30
--- /dev/null
+++ b/llama.cpp/examples/parallel/README.md
@@ -0,0 +1,14 @@
1# llama.cpp/example/parallel
2
3Simplified simulation of serving incoming requests in parallel
4
5## Example
6
7Generate 128 client requests (`-ns 128`), simulating 8 concurrent clients (`-np 8`). The system prompt is shared (`-pps`), meaning that it is computed once at the start. The client requests consist of up to 10 junk questions (`--junk 10`) followed by the actual question.
8
9```bash
10llama-parallel -m model.gguf -np 8 -ns 128 --top-k 1 -pps --junk 10 -c 16384
11```
12
13> [!NOTE]
14> It's recommended to use base models with this example. Instruction tuned models might not be able to properly follow the custom chat template specified here, so the results might not be as expected.