1# llama.cpp Web UI
  2
  3A modern, feature-rich web interface for llama.cpp built with SvelteKit. This UI provides an intuitive chat interface with advanced file handling, conversation management, and comprehensive model interaction capabilities.
  4
  5The WebUI supports two server operation modes:
  6
  7- **MODEL mode** - Single model operation (standard llama-server)
  8- **ROUTER mode** - Multi-model operation with dynamic model loading/unloading
  9
 10---
 11
 12## Table of Contents
 13
 14- [Features](#features)
 15- [Getting Started](#getting-started)
 16- [Tech Stack](#tech-stack)
 17- [Build Pipeline](#build-pipeline)
 18- [Architecture](#architecture)
 19- [Data Flows](#data-flows)
 20- [Architectural Patterns](#architectural-patterns)
 21- [Testing](#testing)
 22
 23---
 24
 25## Features
 26
 27### Chat Interface
 28
 29- **Streaming responses** with real-time updates
 30- **Reasoning content** - Support for models with thinking/reasoning blocks
 31- **Dark/light theme** with system preference detection
 32- **Responsive design** for desktop and mobile
 33
 34### File Attachments
 35
 36- **Images** - JPEG, PNG, GIF, WebP, SVG (with PNG conversion)
 37- **Documents** - PDF (text extraction or image conversion for vision models)
 38- **Audio** - MP3, WAV for audio-capable models
 39- **Text files** - Source code, markdown, and other text formats
 40- **Drag-and-drop** and paste support with rich previews
 41
 42### Conversation Management
 43
 44- **Branching** - Branch messages conversations at any point by editing messages or regenerating responses, navigate between branches
 45- **Regeneration** - Regenerate responses with optional model switching (ROUTER mode)
 46- **Import/Export** - JSON format for backup and sharing
 47- **Search** - Find conversations by title or content
 48
 49### Advanced Rendering
 50
 51- **Syntax highlighting** - Code blocks with language detection
 52- **Math formulas** - KaTeX rendering for LaTeX expressions
 53- **Markdown** - Full GFM support with tables, lists, and more
 54
 55### Multi-Model Support (ROUTER mode)
 56
 57- **Model selector** with Loaded/Available groups
 58- **Automatic loading** - Models load on selection
 59- **Modality validation** - Prevents sending images to non-vision models
 60- **LRU unloading** - Server auto-manages model cache
 61
 62### Keyboard Shortcuts
 63
 64| Shortcut           | Action               |
 65| ------------------ | -------------------- |
 66| `Shift+Ctrl/Cmd+O` | New chat             |
 67| `Shift+Ctrl/Cmd+E` | Edit conversation    |
 68| `Shift+Ctrl/Cmd+D` | Delete conversation  |
 69| `Ctrl/Cmd+K`       | Search conversations |
 70| `Ctrl/Cmd+B`       | Toggle sidebar       |
 71
 72### Developer Experience
 73
 74- **Request tracking** - Monitor token generation with `/slots` endpoint
 75- **Storybook** - Component library with visual testing
 76- **Hot reload** - Instant updates during development
 77
 78---
 79
 80## Getting Started
 81
 82### Prerequisites
 83
 84- **Node.js** 18+ (20+ recommended)
 85- **npm** 9+
 86- **llama-server** running locally (for API access)
 87
 88### 1. Install Dependencies
 89
 90```bash
 91cd tools/server/webui
 92npm install
 93```
 94
 95### 2. Start llama-server
 96
 97In a separate terminal, start the backend server:
 98
 99```bash
100# Single model (MODEL mode)
101./llama-server -m model.gguf
102
103# Multi-model (ROUTER mode)
104./llama-server --model-store /path/to/models
105```
106
107### 3. Start Development Servers
108
109```bash
110npm run dev
111```
112
113This starts:
114
115- **Vite dev server** at `http://localhost:5173` - The main WebUI
116- **Storybook** at `http://localhost:6006` - Component documentation
117
118The Vite dev server proxies API requests to `http://localhost:8080` (default llama-server port):
119
120```typescript
121// vite.config.ts proxy configuration
122proxy: {
123  '/v1': 'http://localhost:8080',
124  '/props': 'http://localhost:8080',
125  '/slots': 'http://localhost:8080',
126  '/models': 'http://localhost:8080'
127}
128```
129
130### Development Workflow
131
1321. Open `http://localhost:5173` in your browser
1332. Make changes to `.svelte`, `.ts`, or `.css` files
1343. Changes hot-reload instantly
1354. Use Storybook at `http://localhost:6006` for isolated component development
136
137---
138
139## Tech Stack
140
141| Layer             | Technology                      | Purpose                                                  |
142| ----------------- | ------------------------------- | -------------------------------------------------------- |
143| **Framework**     | SvelteKit + Svelte 5            | Reactive UI with runes (`$state`, `$derived`, `$effect`) |
144| **UI Components** | shadcn-svelte + bits-ui         | Accessible, customizable component library               |
145| **Styling**       | TailwindCSS 4                   | Utility-first CSS with design tokens                     |
146| **Database**      | IndexedDB (Dexie)               | Client-side storage for conversations and messages       |
147| **Build**         | Vite                            | Fast bundling with static adapter                        |
148| **Testing**       | Playwright + Vitest + Storybook | E2E, unit, and visual testing                            |
149| **Markdown**      | remark + rehype                 | Markdown processing with KaTeX and syntax highlighting   |
150
151### Key Dependencies
152
153```json
154{
155	"svelte": "^5.0.0",
156	"bits-ui": "^2.8.11",
157	"dexie": "^4.0.11",
158	"pdfjs-dist": "^5.4.54",
159	"highlight.js": "^11.11.1",
160	"rehype-katex": "^7.0.1"
161}
162```
163
164---
165
166## Build Pipeline
167
168### Development Build
169
170```bash
171npm run dev
172```
173
174Runs Vite in development mode with:
175
176- Hot Module Replacement (HMR)
177- Source maps
178- Proxy to llama-server
179
180### Production Build
181
182```bash
183npm run build
184```
185
186The build process:
187
1881. **Vite Build** - Bundles all TypeScript, Svelte, and CSS
1892. **Static Adapter** - Outputs to `../public` (llama-server's static file directory)
1903. **Post-Build Script** - Cleans up intermediate files
1914. **Custom Plugin** - Creates `index.html.gz` with:
192   - Inlined favicon as base64
193   - GZIP compression (level 9)
194   - Deterministic output (zeroed timestamps)
195
196```text
197tools/server/webui/        →  build  →  tools/server/public/
198├── src/                                 ├── index.html.gz  (served by llama-server)
199├── static/                              └── (favicon inlined)
200└── ...
201```
202
203### SvelteKit Configuration
204
205```javascript
206// svelte.config.js
207adapter: adapter({
208  pages: '../public',      // Output directory
209  assets: '../public',     // Static assets
210  fallback: 'index.html',  // SPA fallback
211  strict: true
212}),
213output: {
214  bundleStrategy: 'inline' // Single-file bundle
215}
216```
217
218### Integration with llama-server
219
220The WebUI is embedded directly into the llama-server binary:
221
2221. `npm run build` outputs `index.html.gz` to `tools/server/public/`
2232. llama-server compiles this into the binary at build time
2243. When accessing `/`, llama-server serves the gzipped HTML
2254. All assets are inlined (CSS, JS, fonts, favicon)
226
227This results in a **single portable binary** with the full WebUI included.
228
229---
230
231## Architecture
232
233The WebUI follows a layered architecture with unidirectional data flow:
234
235```text
236Routes → Components → Hooks → Stores → Services → Storage/API
237```
238
239### High-Level Architecture
240
241See: [`docs/architecture/high-level-architecture-simplified.md`](docs/architecture/high-level-architecture-simplified.md)
242
243```mermaid
244flowchart TB
245    subgraph Routes["📍 Routes"]
246        R1["/ (Welcome)"]
247        R2["/chat/[id]"]
248        RL["+layout.svelte"]
249    end
250
251    subgraph Components["🧩 Components"]
252        C_Sidebar["ChatSidebar"]
253        C_Screen["ChatScreen"]
254        C_Form["ChatForm"]
255        C_Messages["ChatMessages"]
256        C_ModelsSelector["ModelsSelector"]
257        C_Settings["ChatSettings"]
258    end
259
260    subgraph Stores["🗄️ Stores"]
261        S1["chatStore"]
262        S2["conversationsStore"]
263        S3["modelsStore"]
264        S4["serverStore"]
265        S5["settingsStore"]
266    end
267
268    subgraph Services["⚙️ Services"]
269        SV1["ChatService"]
270        SV2["ModelsService"]
271        SV3["PropsService"]
272        SV4["DatabaseService"]
273    end
274
275    subgraph Storage["💾 Storage"]
276        ST1["IndexedDB"]
277        ST2["LocalStorage"]
278    end
279
280    subgraph APIs["🌐 llama-server"]
281        API1["/v1/chat/completions"]
282        API2["/props"]
283        API3["/models/*"]
284    end
285
286    R1 & R2 --> C_Screen
287    RL --> C_Sidebar
288    C_Screen --> C_Form & C_Messages & C_Settings
289    C_Screen --> S1 & S2
290    C_ModelsSelector --> S3 & S4
291    S1 --> SV1 & SV4
292    S3 --> SV2 & SV3
293    SV4 --> ST1
294    SV1 --> API1
295    SV2 --> API3
296    SV3 --> API2
297```
298
299### Layer Breakdown
300
301#### Routes (`src/routes/`)
302
303- **`/`** - Welcome screen, creates new conversation
304- **`/chat/[id]`** - Active chat interface
305- **`+layout.svelte`** - Sidebar, navigation, global initialization
306
307#### Components (`src/lib/components/`)
308
309Components are organized in `app/` (application-specific) and `ui/` (shadcn-svelte primitives).
310
311**Chat Components** (`app/chat/`):
312
313| Component          | Responsibility                                                              |
314| ------------------ | --------------------------------------------------------------------------- |
315| `ChatScreen/`      | Main chat container, coordinates message list, input form, and attachments  |
316| `ChatForm/`        | Message input textarea with file upload, paste handling, keyboard shortcuts |
317| `ChatMessages/`    | Message list with branch navigation, regenerate/continue/edit actions       |
318| `ChatAttachments/` | File attachment previews, drag-and-drop, PDF/image/audio handling           |
319| `ChatSettings/`    | Parameter sliders (temperature, top-p, etc.) with server default sync       |
320| `ChatSidebar/`     | Conversation list, search, import/export, navigation                        |
321
322**Dialog Components** (`app/dialogs/`):
323
324| Component                       | Responsibility                                           |
325| ------------------------------- | -------------------------------------------------------- |
326| `DialogChatSettings`            | Full-screen settings configuration                       |
327| `DialogModelInformation`        | Model details (context size, modalities, parallel slots) |
328| `DialogChatAttachmentPreview`   | Full preview for images, PDFs (text or page view), code  |
329| `DialogConfirmation`            | Generic confirmation for destructive actions             |
330| `DialogConversationTitleUpdate` | Edit conversation title                                  |
331
332**Server/Model Components** (`app/server/`, `app/models/`):
333
334| Component           | Responsibility                                            |
335| ------------------- | --------------------------------------------------------- |
336| `ServerErrorSplash` | Error display when server is unreachable                  |
337| `ModelsSelector`    | Model dropdown with Loaded/Available groups (ROUTER mode) |
338
339**Shared UI Components** (`app/misc/`):
340
341| Component                        | Responsibility                                                   |
342| -------------------------------- | ---------------------------------------------------------------- |
343| `MarkdownContent`                | Markdown rendering with KaTeX, syntax highlighting, copy buttons |
344| `SyntaxHighlightedCode`          | Code blocks with language detection and highlighting             |
345| `ActionButton`, `ActionDropdown` | Reusable action buttons and menus                                |
346| `BadgeModality`, `BadgeInfo`     | Status and capability badges                                     |
347
348#### Hooks (`src/lib/hooks/`)
349
350- **`useModelChangeValidation`** - Validates model switch against conversation modalities
351- **`useProcessingState`** - Tracks streaming progress and token generation
352
353#### Stores (`src/lib/stores/`)
354
355| Store                | Responsibility                                            |
356| -------------------- | --------------------------------------------------------- |
357| `chatStore`          | Message sending, streaming, abort control, error handling |
358| `conversationsStore` | CRUD for conversations, message branching, navigation     |
359| `modelsStore`        | Model list, selection, loading/unloading (ROUTER)         |
360| `serverStore`        | Server properties, role detection, modalities             |
361| `settingsStore`      | User preferences, parameter sync with server defaults     |
362
363#### Services (`src/lib/services/`)
364
365| Service                | Responsibility                                  |
366| ---------------------- | ----------------------------------------------- |
367| `ChatService`          | API calls to`/v1/chat/completions`, SSE parsing |
368| `ModelsService`        | `/models`, `/models/load`, `/models/unload`     |
369| `PropsService`         | `/props`, `/props?model=`                       |
370| `DatabaseService`      | IndexedDB operations via Dexie                  |
371| `ParameterSyncService` | Syncs settings with server defaults             |
372
373---
374
375## Data Flows
376
377### MODEL Mode (Single Model)
378
379See: [`docs/flows/data-flow-simplified-model-mode.md`](docs/flows/data-flow-simplified-model-mode.md)
380
381```mermaid
382sequenceDiagram
383    participant User
384    participant UI
385    participant Stores
386    participant DB as IndexedDB
387    participant API as llama-server
388
389    Note over User,API: Initialization
390    UI->>Stores: initialize()
391    Stores->>DB: load conversations
392    Stores->>API: GET /props
393    API-->>Stores: server config
394    Stores->>API: GET /v1/models
395    API-->>Stores: single model (auto-selected)
396
397    Note over User,API: Chat Flow
398    User->>UI: send message
399    Stores->>DB: save user message
400    Stores->>API: POST /v1/chat/completions (stream)
401    loop streaming
402        API-->>Stores: SSE chunks
403        Stores-->>UI: reactive update
404    end
405    Stores->>DB: save assistant message
406```
407
408### ROUTER Mode (Multi-Model)
409
410See: [`docs/flows/data-flow-simplified-router-mode.md`](docs/flows/data-flow-simplified-router-mode.md)
411
412```mermaid
413sequenceDiagram
414    participant User
415    participant UI
416    participant Stores
417    participant API as llama-server
418
419    Note over User,API: Initialization
420    Stores->>API: GET /props
421    API-->>Stores: {role: "router"}
422    Stores->>API: GET /models
423    API-->>Stores: models[] with status
424
425    Note over User,API: Model Selection
426    User->>UI: select model
427    alt model not loaded
428        Stores->>API: POST /models/load
429        loop poll status
430            Stores->>API: GET /models
431        end
432        Stores->>API: GET /props?model=X
433    end
434    Stores->>Stores: validate modalities
435
436    Note over User,API: Chat Flow
437    Stores->>API: POST /v1/chat/completions {model: X}
438    loop streaming
439        API-->>Stores: SSE chunks + model info
440    end
441```
442
443### Detailed Flow Diagrams
444
445| Flow          | Description                                | File                                                        |
446| ------------- | ------------------------------------------ | ----------------------------------------------------------- |
447| Chat          | Message lifecycle, streaming, regeneration | [`chat-flow.md`](docs/flows/chat-flow.md)                   |
448| Models        | Loading, unloading, modality caching       | [`models-flow.md`](docs/flows/models-flow.md)               |
449| Server        | Props fetching, role detection             | [`server-flow.md`](docs/flows/server-flow.md)               |
450| Conversations | CRUD, branching, import/export             | [`conversations-flow.md`](docs/flows/conversations-flow.md) |
451| Database      | IndexedDB schema, operations               | [`database-flow.md`](docs/flows/database-flow.md)           |
452| Settings      | Parameter sync, user overrides             | [`settings-flow.md`](docs/flows/settings-flow.md)           |
453
454---
455
456## Architectural Patterns
457
458### 1. Reactive State with Svelte 5 Runes
459
460All stores use Svelte 5's fine-grained reactivity:
461
462```typescript
463// Store with reactive state
464class ChatStore {
465	#isLoading = $state(false);
466	#currentResponse = $state('');
467
468	// Derived values auto-update
469	get isStreaming() {
470		return $derived(this.#isLoading && this.#currentResponse.length > 0);
471	}
472}
473
474// Exported reactive accessors
475export const isLoading = () => chatStore.isLoading;
476export const currentResponse = () => chatStore.currentResponse;
477```
478
479### 2. Unidirectional Data Flow
480
481Data flows in one direction, making state predictable:
482
483```mermaid
484flowchart LR
485    subgraph UI["UI Layer"]
486        A[User Action] --> B[Component]
487    end
488
489    subgraph State["State Layer"]
490        B --> C[Store Method]
491        C --> D[State Update]
492    end
493
494    subgraph IO["I/O Layer"]
495        C --> E[Service]
496        E --> F[API / IndexedDB]
497        F -.->|Response| D
498    end
499
500    D -->|Reactive| B
501```
502
503Components dispatch actions to stores, stores coordinate with services for I/O, and state updates reactively propagate back to the UI.
504
505### 3. Per-Conversation State
506
507Enables concurrent streaming across multiple conversations:
508
509```typescript
510class ChatStore {
511	chatLoadingStates = new Map<string, boolean>();
512	chatStreamingStates = new Map<string, { response: string; messageId: string }>();
513	abortControllers = new Map<string, AbortController>();
514}
515```
516
517### 4. Message Branching with Tree Structure
518
519Conversations are stored as a tree, not a linear list:
520
521```typescript
522interface DatabaseMessage {
523	id: string;
524	parent: string | null; // Points to parent message
525	children: string[]; // List of child message IDs
526	// ...
527}
528
529interface DatabaseConversation {
530	currentNode: string; // Currently viewed branch tip
531	// ...
532}
533```
534
535Navigation between branches updates `currentNode` without losing history.
536
537### 5. Layered Service Architecture
538
539Stores handle state; services handle I/O:
540
541```text
542┌─────────────────┐
543│     Stores      │  Business logic, state management
544├─────────────────┤
545│    Services     │  API calls, database operations
546├─────────────────┤
547│   Storage/API   │  IndexedDB, LocalStorage, HTTP
548└─────────────────┘
549```
550
551### 6. Server Role Abstraction
552
553Single codebase handles both MODEL and ROUTER modes:
554
555```typescript
556// serverStore.ts
557get isRouterMode() {
558  return this.role === ServerRole.ROUTER;
559}
560
561// Components conditionally render based on mode
562{#if isRouterMode()}
563  <ModelsSelector />
564{/if}
565```
566
567### 7. Modality Validation
568
569Prevents sending attachments to incompatible models:
570
571```typescript
572// useModelChangeValidation hook
573const validate = (modelId: string) => {
574	const modelModalities = modelsStore.getModelModalities(modelId);
575	const conversationModalities = conversationsStore.usedModalities;
576
577	// Check if model supports all used modalities
578	if (conversationModalities.hasImages && !modelModalities.vision) {
579		return { valid: false, reason: 'Model does not support images' };
580	}
581	// ...
582};
583```
584
585### 8. Persistent Storage Strategy
586
587Data is persisted across sessions using two storage mechanisms:
588
589```mermaid
590flowchart TB
591    subgraph Browser["Browser Storage"]
592        subgraph IDB["IndexedDB (Dexie)"]
593            C[Conversations]
594            M[Messages]
595        end
596        subgraph LS["LocalStorage"]
597            S[Settings Config]
598            O[User Overrides]
599            T[Theme Preference]
600        end
601    end
602
603    subgraph Stores["Svelte Stores"]
604        CS[conversationsStore] --> C
605        CS --> M
606        SS[settingsStore] --> S
607        SS --> O
608        SS --> T
609    end
610```
611
612- **IndexedDB**: Conversations and messages (large, structured data)
613- **LocalStorage**: Settings, user parameter overrides, theme (small key-value data)
614- **Memory only**: Server props, model list (fetched fresh on each session)
615
616---
617
618## Testing
619
620### Test Types
621
622| Type          | Tool               | Location         | Command             |
623| ------------- | ------------------ | ---------------- | ------------------- |
624| **Unit**      | Vitest             | `tests/unit/`    | `npm run test:unit` |
625| **UI/Visual** | Storybook + Vitest | `tests/stories/` | `npm run test:ui`   |
626| **E2E**       | Playwright         | `tests/e2e/`     | `npm run test:e2e`  |
627| **Client**    | Vitest             | `tests/client/`. | `npm run test:unit` |
628
629### Running Tests
630
631```bash
632# All tests
633npm run test
634
635# Individual test suites
636npm run test:e2e      # End-to-end (requires llama-server)
637npm run test:client   # Client-side unit tests
638npm run test:server   # Server-side unit tests
639npm run test:ui       # Storybook visual tests
640```
641
642### Storybook Development
643
644```bash
645npm run storybook     # Start Storybook dev server on :6006
646npm run build-storybook  # Build static Storybook
647```
648
649### Linting and Formatting
650
651```bash
652npm run lint          # Check code style
653npm run format        # Auto-format with Prettier
654npm run check         # TypeScript type checking
655```
656
657---
658
659## Project Structure
660
661```text
662tools/server/webui/
663├── src/
664│   ├── lib/
665│   │   ├── components/   # UI components (app/, ui/)
666│   │   ├── hooks/        # Svelte hooks
667│   │   ├── stores/       # State management
668│   │   ├── services/     # API and database services
669│   │   ├── types/        # TypeScript interfaces
670│   │   └── utils/        # Utility functions
671│   ├── routes/           # SvelteKit routes
672│   └── styles/           # Global styles
673├── static/               # Static assets
674├── tests/                # Test files
675├── docs/                 # Architecture diagrams
676│   ├── architecture/     # High-level architecture
677│   └── flows/            # Feature-specific flows
678└── .storybook/           # Storybook configuration
679```
680
681---
682
683## Related Documentation
684
685- [llama.cpp Server README](../README.md) - Full server documentation
686- [Multimodal Documentation](../../../docs/multimodal.md) - Image and audio support
687- [Function Calling](../../../docs/function-calling.md) - Tool use capabilities