llmnpc - Testing how LLM's fair as NPC's

An experiment using tiny LLMs as NPCs that could be embedded into the game.

Embed models into the game, build a simple vector database from text, embed prompts, retrieve top‑k by cosine similarity, and feed context into tiny CPU LLMs for NPC interactions.

No external API calls. Everything is local, directly using GGUF models and llama.cpp for inference.

https://github.com/user-attachments/assets/863b75eb-0da7-4235-8112-f00bc82d81f6

[!NOTE] This project is just for fun, to see how LLMs would fare as NPCs. Because of the non-deterministic nature of LLMs, the results vary and are often quite funny. A lot of tweaking would be needed to make this really useful in real games, but not impossible.

Goals of the experiment:

Have LLM be run only on CPU, this is why small LLMs have been chosen in this experiment, so they can be used in other games.
To produce a simple C library that can be reused elsewhere.
Test existing small and tiny LLMs and provide some useful results on how they behave.

Getting started

Build dependencies and binaries:

make build/llama.cpp
make run/fetch-models
make build/context
make build/game

Build a vector context database:
```
build/corpus
```
Run the game:
```
./game -m phi-4-mini-instruct -e qwen3
```

Building

Prerequisites

C compiler (gcc/clang)
CMake
Docker (optional, for containerized use of binaries)

Build Steps

Build llama.cpp libraries:
```
make build/llama.cpp
```
Download models:
```
make run/fetch-models
```

Build binaries:

make build/context
make build/prompts
make build/npc
make build/game

Usage

Build a vector context database

context reads a text file (one document per line), embeds each line, and produces a binary vector database file. For best results, use a dedicated embedding model (for example, qwen3) even if you generate answers with a different model.

./context -m qwen3 -i corpus/map1_keldor.txt -o corpus/map1_keldor.vdb

Run an NPC query with retrieved context

npc loads a vector database, embeds the prompt, selects the top 5 matching lines by cosine similarity, and runs the NPC system prompt against that context. You can pass a separate embedding model with -e/--embed-model.

./npc -m phi-4-mini-instruct -e qwen3 -p "Who is Keldor?" -c corpus/map1_keldor.vdb
./npc -m qwen3 -e qwen3 -p "What does Keldor believe about the marsh lights?" -c corpus/map1_keldor.vdb

Run the game

The game uses the same models and retrieval pipeline, with short NPC replies.

./game -m phi-4-mini-instruct -e qwen3

context options

Flag	Description
`-m, --model`	Embedding model to use (default: first model in config)
`-i, --in`	Input context text file (required)
`-o, --out`	Output vector database file (required)
`-l, --list`	List available models
`-v, --verbose`	Enable llama.cpp logging
`-h, --help`	Show help message

npc options

Flag	Description
`-m, --model`	Model to use (required)
`-e, --embed-model`	Embedding model to use (optional)
`-p, --prompt`	Prompt text (required)
`-c, --context`	Context vector database file (.vdb) (required)
`-l, --list`	List available models
`-v, --verbose`	Enable llama.cpp logging
`-h, --help`	Show help message

game options

Flag	Description
`-m, --model`	Model to use (default: first model in config)
`-e, --embed-model`	Embedding model to use (optional)
`-v, --verbose`	Enable llama.cpp logging
`-h, --help`	Show help message

Models

Configure models in models.h. The default model is the first entry in the models array; each entry points at a local GGUF file under models/.

Vector database format

context produces a binary file with a fixed header and a contiguous list of documents. The header includes a magic value, version, embedding size, maximum text length, and document count. Each document stores the original text (fixed size VDB_MAX_TEXT) and its embedding (VDB_EMBED_SIZE).

Docker

make run/docker

Builds a Docker image and runs an interactive shell with the binaries and models under /app/.

Cleaning

make run/clean