1An experiment using tiny LLMs as NPCs that could be embedded into the game.
  2
  3Embed models into the game, build a simple vector database from text, embed 
  4prompts, retrieve top‑k by cosine similarity, and feed context into tiny 
  5CPU LLMs for NPC interactions.
  6
  7**No external API calls.** Everything is local, directly using GGUF models 
  8and [llama.cpp](https://github.com/ggml-org/llama.cpp) for inference.
  9
 10https://github.com/user-attachments/assets/863b75eb-0da7-4235-8112-f00bc82d81f6
 11
 12> [!NOTE]
 13> This project is just for fun, to see how LLMs would fare as NPCs. Because of
 14> the non-deterministic nature of LLMs, the results vary and are often quite
 15> funny. A lot of tweaking would be needed to make this really useful in real
 16> games, but not impossible.
 17
 18Goals of the experiment:
 19
 20- Have LLM be run only on CPU, this is why small LLMs have been chosen in this
 21  experiment, so they can be used in other games.
 22- To produce a simple C library that can be reused elsewhere.
 23- Test existing small and tiny LLMs and provide some useful results on how they
 24  behave.
 25
 26## Getting started
 27
 281. Build dependencies and binaries:
 29   ```bash
 30   make build/llama.cpp
 31   make run/fetch-models
 32   make build/context
 33   make build/game
 34   ```
 35
 362. Build a vector context database:
 37   ```bash
 38   build/corpus
 39   ```
 40
 413. Run the game:
 42   ```bash
 43   ./game -m phi-4-mini-instruct -e qwen3
 44   ```
 45
 46## Building
 47
 48### Prerequisites
 49
 50- C compiler (gcc/clang)
 51- CMake
 52- Docker (optional, for containerized use of binaries)
 53
 54### Build Steps
 55
 561. Build llama.cpp libraries:
 57   ```bash
 58   make build/llama.cpp
 59   ```
 60
 612. Download models:
 62   ```bash
 63   make run/fetch-models
 64   ```
 65
 663. Build binaries:
 67   ```bash
 68   make build/context
 69   make build/prompts
 70   make build/npc
 71   make build/game
 72   ```
 73
 74## Usage
 75
 76### Build a vector context database
 77
 78`context` reads a text file (one document per line), embeds each line, and
 79produces a binary vector database file. For best results, use a dedicated
 80embedding model (for example, `qwen3`) even if you generate answers with a
 81different model.
 82
 83```bash
 84./context -m qwen3 -i corpus/map1_keldor.txt -o corpus/map1_keldor.vdb
 85```
 86
 87### Run an NPC query with retrieved context
 88
 89`npc` loads a vector database, embeds the prompt, selects the top 5 matching
 90lines by cosine similarity, and runs the NPC system prompt against that context.
 91You can pass a separate embedding model with `-e`/`--embed-model`.
 92
 93```bash
 94./npc -m phi-4-mini-instruct -e qwen3 -p "Who is Keldor?" -c corpus/map1_keldor.vdb
 95./npc -m qwen3 -e qwen3 -p "What does Keldor believe about the marsh lights?" -c corpus/map1_keldor.vdb
 96```
 97
 98### Run the game
 99
100The game uses the same models and retrieval pipeline, with short NPC replies.
101
102```bash
103./game -m phi-4-mini-instruct -e qwen3
104```
105
106### context options
107
108| Flag | Description |
109|------|-------------|
110| `-m, --model` | Embedding model to use (default: first model in config) |
111| `-i, --in` | Input context text file (required) |
112| `-o, --out` | Output vector database file (required) |
113| `-l, --list` | List available models |
114| `-v, --verbose` | Enable llama.cpp logging |
115| `-h, --help` | Show help message |
116
117### npc options
118
119| Flag | Description |
120|------|-------------|
121| `-m, --model` | Model to use (required) |
122| `-e, --embed-model` | Embedding model to use (optional) |
123| `-p, --prompt` | Prompt text (required) |
124| `-c, --context` | Context vector database file (.vdb) (required) |
125| `-l, --list` | List available models |
126| `-v, --verbose` | Enable llama.cpp logging |
127| `-h, --help` | Show help message |
128
129### game options
130
131| Flag | Description |
132|------|-------------|
133| `-m, --model` | Model to use (default: first model in config) |
134| `-e, --embed-model` | Embedding model to use (optional) |
135| `-v, --verbose` | Enable llama.cpp logging |
136| `-h, --help` | Show help message |
137
138## Models
139
140Configure models in `models.h`. The default model is the first entry in the
141`models` array; each entry points at a local GGUF file under `models/`.
142
143## Vector database format
144
145`context` produces a binary file with a fixed header and a contiguous list of
146documents. The header includes a magic value, version, embedding size, maximum
147text length, and document count. Each document stores the original text (fixed
148size `VDB_MAX_TEXT`) and its embedding (`VDB_EMBED_SIZE`).
149
150## Docker
151
152```bash
153make run/docker
154```
155
156Builds a Docker image and runs an interactive shell with the binaries and
157models under `/app/`.
158
159## Cleaning
160
161```bash
162make run/clean
163```
164
165## Reading material
166
167- https://www.tinyllm.org/
168- https://en.wikipedia.org/wiki/Cosine_similarity