README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

An experiment using tiny LLMs as NPCs that could be embedded into the game.

> [!NOTE]
> This project is just for fun, to see how LLMs would fare as NPCs. Because of
> the non-deterministic nature of LLMs, the results vary and are often quite
> funny. A lot of tweaking would be needed to make this really useful in real
> games, but not impossible.

Goals of the experiment:

- Have LLM be run only on CPU, this is why small LLMs have been chosen in this
  experiment, so they can be used in other games.
- To produce a simple C library that can be reused elsewhere.
- Test existing small and tiny LLMs and provide some useful results on how they
  behave.

## Building

### Prerequisites

- C compiler (gcc/clang)
- CMake
- Docker (optional, for containerized use of binaries)

### Build Steps

1. Build llama.cpp libraries:
   ```bash
   make build/llama.cpp
   ```

2. Download models:
   ```bash
   make run/fetch-models
   ```

3. Build binaries:
   ```bash
   make build/context
   make build/prompts
   make build/npc
   make build/game
   ```

## Usage

### Build a vector context database

`context` reads a text file (one document per line), embeds each line, and
produces a binary vector database file. For best results, use a dedicated
embedding model (for example, `qwen3`) even if you generate answers with a
different model.

```bash
./context -m qwen3 -i corpus/lotr.txt -o corpus/lotr.vdb
```

### Run an NPC query with retrieved context

`npc` loads a vector database, embeds the prompt, selects the top 5 matching
lines by cosine similarity, and runs the NPC system prompt against that context.
You can pass a separate embedding model with `-e`/`--embed-model`.

```bash
./npc -m phi-4-mini-instruct -e qwen3 -p "Who is Gandalf?" -c corpus/lotr.vdb
./npc -m qwen3 -e qwen3 -p "Who is Frodo?" -c corpus/lotr.vdb
```

### Run the game

The game uses the same models and retrieval pipeline, with short NPC replies.

```bash
./game -m phi-4-mini-instruct -e qwen3
```

### context options

| Flag | Description |
|------|-------------|
| `-m, --model` | Embedding model to use (default: first model in config) |
| `-i, --in` | Input context text file (required) |
| `-o, --out` | Output vector database file (required) |
| `-l, --list` | List available models |
| `-v, --verbose` | Enable llama.cpp logging |
| `-h, --help` | Show help message |

### npc options

| Flag | Description |
|------|-------------|
| `-m, --model` | Model to use (required) |
| `-e, --embed-model` | Embedding model to use (optional) |
| `-p, --prompt` | Prompt text (required) |
| `-c, --context` | Context vector database file (.vdb) (required) |
| `-l, --list` | List available models |
| `-v, --verbose` | Enable llama.cpp logging |
| `-h, --help` | Show help message |

### game options

| Flag | Description |
|------|-------------|
| `-m, --model` | Model to use (default: first model in config) |
| `-e, --embed-model` | Embedding model to use (optional) |
| `-v, --verbose` | Enable llama.cpp logging |
| `-h, --help` | Show help message |

## Models

Configure models in `models.h`. The default model is the first entry in the
`models` array; each entry points at a local GGUF file under `models/`.

## Vector database format

`context` produces a binary file with a fixed header and a contiguous list of
documents. The header includes a magic value, version, embedding size, maximum
text length, and document count. Each document stores the original text (fixed
size `VDB_MAX_TEXT`) and its embedding (`VDB_EMBED_SIZE`).

## Docker

```bash
make run/docker
```

Builds a Docker image and runs an interactive shell with the binaries and
models under `/app/`.

## Cleaning

```bash
make run/clean
```

## Reading material

- https://www.tinyllm.org/
- https://en.wikipedia.org/wiki/Cosine_similarity