summaryrefslogtreecommitdiff
path: root/llama.cpp/tools/server/webui/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'llama.cpp/tools/server/webui/README.md')
-rw-r--r--llama.cpp/tools/server/webui/README.md687
1 files changed, 687 insertions, 0 deletions
diff --git a/llama.cpp/tools/server/webui/README.md b/llama.cpp/tools/server/webui/README.md
new file mode 100644
index 0000000..98b01fd
--- /dev/null
+++ b/llama.cpp/tools/server/webui/README.md
@@ -0,0 +1,687 @@
+# llama.cpp Web UI
+
+A modern, feature-rich web interface for llama.cpp built with SvelteKit. This UI provides an intuitive chat interface with advanced file handling, conversation management, and comprehensive model interaction capabilities.
+
+The WebUI supports two server operation modes:
+
+- **MODEL mode** - Single model operation (standard llama-server)
+- **ROUTER mode** - Multi-model operation with dynamic model loading/unloading
+
+---
+
+## Table of Contents
+
+- [Features](#features)
+- [Getting Started](#getting-started)
+- [Tech Stack](#tech-stack)
+- [Build Pipeline](#build-pipeline)
+- [Architecture](#architecture)
+- [Data Flows](#data-flows)
+- [Architectural Patterns](#architectural-patterns)
+- [Testing](#testing)
+
+---
+
+## Features
+
+### Chat Interface
+
+- **Streaming responses** with real-time updates
+- **Reasoning content** - Support for models with thinking/reasoning blocks
+- **Dark/light theme** with system preference detection
+- **Responsive design** for desktop and mobile
+
+### File Attachments
+
+- **Images** - JPEG, PNG, GIF, WebP, SVG (with PNG conversion)
+- **Documents** - PDF (text extraction or image conversion for vision models)
+- **Audio** - MP3, WAV for audio-capable models
+- **Text files** - Source code, markdown, and other text formats
+- **Drag-and-drop** and paste support with rich previews
+
+### Conversation Management
+
+- **Branching** - Branch messages conversations at any point by editing messages or regenerating responses, navigate between branches
+- **Regeneration** - Regenerate responses with optional model switching (ROUTER mode)
+- **Import/Export** - JSON format for backup and sharing
+- **Search** - Find conversations by title or content
+
+### Advanced Rendering
+
+- **Syntax highlighting** - Code blocks with language detection
+- **Math formulas** - KaTeX rendering for LaTeX expressions
+- **Markdown** - Full GFM support with tables, lists, and more
+
+### Multi-Model Support (ROUTER mode)
+
+- **Model selector** with Loaded/Available groups
+- **Automatic loading** - Models load on selection
+- **Modality validation** - Prevents sending images to non-vision models
+- **LRU unloading** - Server auto-manages model cache
+
+### Keyboard Shortcuts
+
+| Shortcut | Action |
+| ------------------ | -------------------- |
+| `Shift+Ctrl/Cmd+O` | New chat |
+| `Shift+Ctrl/Cmd+E` | Edit conversation |
+| `Shift+Ctrl/Cmd+D` | Delete conversation |
+| `Ctrl/Cmd+K` | Search conversations |
+| `Ctrl/Cmd+B` | Toggle sidebar |
+
+### Developer Experience
+
+- **Request tracking** - Monitor token generation with `/slots` endpoint
+- **Storybook** - Component library with visual testing
+- **Hot reload** - Instant updates during development
+
+---
+
+## Getting Started
+
+### Prerequisites
+
+- **Node.js** 18+ (20+ recommended)
+- **npm** 9+
+- **llama-server** running locally (for API access)
+
+### 1. Install Dependencies
+
+```bash
+cd tools/server/webui
+npm install
+```
+
+### 2. Start llama-server
+
+In a separate terminal, start the backend server:
+
+```bash
+# Single model (MODEL mode)
+./llama-server -m model.gguf
+
+# Multi-model (ROUTER mode)
+./llama-server --model-store /path/to/models
+```
+
+### 3. Start Development Servers
+
+```bash
+npm run dev
+```
+
+This starts:
+
+- **Vite dev server** at `http://localhost:5173` - The main WebUI
+- **Storybook** at `http://localhost:6006` - Component documentation
+
+The Vite dev server proxies API requests to `http://localhost:8080` (default llama-server port):
+
+```typescript
+// vite.config.ts proxy configuration
+proxy: {
+ '/v1': 'http://localhost:8080',
+ '/props': 'http://localhost:8080',
+ '/slots': 'http://localhost:8080',
+ '/models': 'http://localhost:8080'
+}
+```
+
+### Development Workflow
+
+1. Open `http://localhost:5173` in your browser
+2. Make changes to `.svelte`, `.ts`, or `.css` files
+3. Changes hot-reload instantly
+4. Use Storybook at `http://localhost:6006` for isolated component development
+
+---
+
+## Tech Stack
+
+| Layer | Technology | Purpose |
+| ----------------- | ------------------------------- | -------------------------------------------------------- |
+| **Framework** | SvelteKit + Svelte 5 | Reactive UI with runes (`$state`, `$derived`, `$effect`) |
+| **UI Components** | shadcn-svelte + bits-ui | Accessible, customizable component library |
+| **Styling** | TailwindCSS 4 | Utility-first CSS with design tokens |
+| **Database** | IndexedDB (Dexie) | Client-side storage for conversations and messages |
+| **Build** | Vite | Fast bundling with static adapter |
+| **Testing** | Playwright + Vitest + Storybook | E2E, unit, and visual testing |
+| **Markdown** | remark + rehype | Markdown processing with KaTeX and syntax highlighting |
+
+### Key Dependencies
+
+```json
+{
+ "svelte": "^5.0.0",
+ "bits-ui": "^2.8.11",
+ "dexie": "^4.0.11",
+ "pdfjs-dist": "^5.4.54",
+ "highlight.js": "^11.11.1",
+ "rehype-katex": "^7.0.1"
+}
+```
+
+---
+
+## Build Pipeline
+
+### Development Build
+
+```bash
+npm run dev
+```
+
+Runs Vite in development mode with:
+
+- Hot Module Replacement (HMR)
+- Source maps
+- Proxy to llama-server
+
+### Production Build
+
+```bash
+npm run build
+```
+
+The build process:
+
+1. **Vite Build** - Bundles all TypeScript, Svelte, and CSS
+2. **Static Adapter** - Outputs to `../public` (llama-server's static file directory)
+3. **Post-Build Script** - Cleans up intermediate files
+4. **Custom Plugin** - Creates `index.html.gz` with:
+ - Inlined favicon as base64
+ - GZIP compression (level 9)
+ - Deterministic output (zeroed timestamps)
+
+```text
+tools/server/webui/ → build → tools/server/public/
+├── src/ ├── index.html.gz (served by llama-server)
+├── static/ └── (favicon inlined)
+└── ...
+```
+
+### SvelteKit Configuration
+
+```javascript
+// svelte.config.js
+adapter: adapter({
+ pages: '../public', // Output directory
+ assets: '../public', // Static assets
+ fallback: 'index.html', // SPA fallback
+ strict: true
+}),
+output: {
+ bundleStrategy: 'inline' // Single-file bundle
+}
+```
+
+### Integration with llama-server
+
+The WebUI is embedded directly into the llama-server binary:
+
+1. `npm run build` outputs `index.html.gz` to `tools/server/public/`
+2. llama-server compiles this into the binary at build time
+3. When accessing `/`, llama-server serves the gzipped HTML
+4. All assets are inlined (CSS, JS, fonts, favicon)
+
+This results in a **single portable binary** with the full WebUI included.
+
+---
+
+## Architecture
+
+The WebUI follows a layered architecture with unidirectional data flow:
+
+```text
+Routes → Components → Hooks → Stores → Services → Storage/API
+```
+
+### High-Level Architecture
+
+See: [`docs/architecture/high-level-architecture-simplified.md`](docs/architecture/high-level-architecture-simplified.md)
+
+```mermaid
+flowchart TB
+ subgraph Routes["📍 Routes"]
+ R1["/ (Welcome)"]
+ R2["/chat/[id]"]
+ RL["+layout.svelte"]
+ end
+
+ subgraph Components["🧩 Components"]
+ C_Sidebar["ChatSidebar"]
+ C_Screen["ChatScreen"]
+ C_Form["ChatForm"]
+ C_Messages["ChatMessages"]
+ C_ModelsSelector["ModelsSelector"]
+ C_Settings["ChatSettings"]
+ end
+
+ subgraph Stores["🗄️ Stores"]
+ S1["chatStore"]
+ S2["conversationsStore"]
+ S3["modelsStore"]
+ S4["serverStore"]
+ S5["settingsStore"]
+ end
+
+ subgraph Services["⚙️ Services"]
+ SV1["ChatService"]
+ SV2["ModelsService"]
+ SV3["PropsService"]
+ SV4["DatabaseService"]
+ end
+
+ subgraph Storage["💾 Storage"]
+ ST1["IndexedDB"]
+ ST2["LocalStorage"]
+ end
+
+ subgraph APIs["🌐 llama-server"]
+ API1["/v1/chat/completions"]
+ API2["/props"]
+ API3["/models/*"]
+ end
+
+ R1 & R2 --> C_Screen
+ RL --> C_Sidebar
+ C_Screen --> C_Form & C_Messages & C_Settings
+ C_Screen --> S1 & S2
+ C_ModelsSelector --> S3 & S4
+ S1 --> SV1 & SV4
+ S3 --> SV2 & SV3
+ SV4 --> ST1
+ SV1 --> API1
+ SV2 --> API3
+ SV3 --> API2
+```
+
+### Layer Breakdown
+
+#### Routes (`src/routes/`)
+
+- **`/`** - Welcome screen, creates new conversation
+- **`/chat/[id]`** - Active chat interface
+- **`+layout.svelte`** - Sidebar, navigation, global initialization
+
+#### Components (`src/lib/components/`)
+
+Components are organized in `app/` (application-specific) and `ui/` (shadcn-svelte primitives).
+
+**Chat Components** (`app/chat/`):
+
+| Component | Responsibility |
+| ------------------ | --------------------------------------------------------------------------- |
+| `ChatScreen/` | Main chat container, coordinates message list, input form, and attachments |
+| `ChatForm/` | Message input textarea with file upload, paste handling, keyboard shortcuts |
+| `ChatMessages/` | Message list with branch navigation, regenerate/continue/edit actions |
+| `ChatAttachments/` | File attachment previews, drag-and-drop, PDF/image/audio handling |
+| `ChatSettings/` | Parameter sliders (temperature, top-p, etc.) with server default sync |
+| `ChatSidebar/` | Conversation list, search, import/export, navigation |
+
+**Dialog Components** (`app/dialogs/`):
+
+| Component | Responsibility |
+| ------------------------------- | -------------------------------------------------------- |
+| `DialogChatSettings` | Full-screen settings configuration |
+| `DialogModelInformation` | Model details (context size, modalities, parallel slots) |
+| `DialogChatAttachmentPreview` | Full preview for images, PDFs (text or page view), code |
+| `DialogConfirmation` | Generic confirmation for destructive actions |
+| `DialogConversationTitleUpdate` | Edit conversation title |
+
+**Server/Model Components** (`app/server/`, `app/models/`):
+
+| Component | Responsibility |
+| ------------------- | --------------------------------------------------------- |
+| `ServerErrorSplash` | Error display when server is unreachable |
+| `ModelsSelector` | Model dropdown with Loaded/Available groups (ROUTER mode) |
+
+**Shared UI Components** (`app/misc/`):
+
+| Component | Responsibility |
+| -------------------------------- | ---------------------------------------------------------------- |
+| `MarkdownContent` | Markdown rendering with KaTeX, syntax highlighting, copy buttons |
+| `SyntaxHighlightedCode` | Code blocks with language detection and highlighting |
+| `ActionButton`, `ActionDropdown` | Reusable action buttons and menus |
+| `BadgeModality`, `BadgeInfo` | Status and capability badges |
+
+#### Hooks (`src/lib/hooks/`)
+
+- **`useModelChangeValidation`** - Validates model switch against conversation modalities
+- **`useProcessingState`** - Tracks streaming progress and token generation
+
+#### Stores (`src/lib/stores/`)
+
+| Store | Responsibility |
+| -------------------- | --------------------------------------------------------- |
+| `chatStore` | Message sending, streaming, abort control, error handling |
+| `conversationsStore` | CRUD for conversations, message branching, navigation |
+| `modelsStore` | Model list, selection, loading/unloading (ROUTER) |
+| `serverStore` | Server properties, role detection, modalities |
+| `settingsStore` | User preferences, parameter sync with server defaults |
+
+#### Services (`src/lib/services/`)
+
+| Service | Responsibility |
+| ---------------------- | ----------------------------------------------- |
+| `ChatService` | API calls to`/v1/chat/completions`, SSE parsing |
+| `ModelsService` | `/models`, `/models/load`, `/models/unload` |
+| `PropsService` | `/props`, `/props?model=` |
+| `DatabaseService` | IndexedDB operations via Dexie |
+| `ParameterSyncService` | Syncs settings with server defaults |
+
+---
+
+## Data Flows
+
+### MODEL Mode (Single Model)
+
+See: [`docs/flows/data-flow-simplified-model-mode.md`](docs/flows/data-flow-simplified-model-mode.md)
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant UI
+ participant Stores
+ participant DB as IndexedDB
+ participant API as llama-server
+
+ Note over User,API: Initialization
+ UI->>Stores: initialize()
+ Stores->>DB: load conversations
+ Stores->>API: GET /props
+ API-->>Stores: server config
+ Stores->>API: GET /v1/models
+ API-->>Stores: single model (auto-selected)
+
+ Note over User,API: Chat Flow
+ User->>UI: send message
+ Stores->>DB: save user message
+ Stores->>API: POST /v1/chat/completions (stream)
+ loop streaming
+ API-->>Stores: SSE chunks
+ Stores-->>UI: reactive update
+ end
+ Stores->>DB: save assistant message
+```
+
+### ROUTER Mode (Multi-Model)
+
+See: [`docs/flows/data-flow-simplified-router-mode.md`](docs/flows/data-flow-simplified-router-mode.md)
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant UI
+ participant Stores
+ participant API as llama-server
+
+ Note over User,API: Initialization
+ Stores->>API: GET /props
+ API-->>Stores: {role: "router"}
+ Stores->>API: GET /models
+ API-->>Stores: models[] with status
+
+ Note over User,API: Model Selection
+ User->>UI: select model
+ alt model not loaded
+ Stores->>API: POST /models/load
+ loop poll status
+ Stores->>API: GET /models
+ end
+ Stores->>API: GET /props?model=X
+ end
+ Stores->>Stores: validate modalities
+
+ Note over User,API: Chat Flow
+ Stores->>API: POST /v1/chat/completions {model: X}
+ loop streaming
+ API-->>Stores: SSE chunks + model info
+ end
+```
+
+### Detailed Flow Diagrams
+
+| Flow | Description | File |
+| ------------- | ------------------------------------------ | ----------------------------------------------------------- |
+| Chat | Message lifecycle, streaming, regeneration | [`chat-flow.md`](docs/flows/chat-flow.md) |
+| Models | Loading, unloading, modality caching | [`models-flow.md`](docs/flows/models-flow.md) |
+| Server | Props fetching, role detection | [`server-flow.md`](docs/flows/server-flow.md) |
+| Conversations | CRUD, branching, import/export | [`conversations-flow.md`](docs/flows/conversations-flow.md) |
+| Database | IndexedDB schema, operations | [`database-flow.md`](docs/flows/database-flow.md) |
+| Settings | Parameter sync, user overrides | [`settings-flow.md`](docs/flows/settings-flow.md) |
+
+---
+
+## Architectural Patterns
+
+### 1. Reactive State with Svelte 5 Runes
+
+All stores use Svelte 5's fine-grained reactivity:
+
+```typescript
+// Store with reactive state
+class ChatStore {
+ #isLoading = $state(false);
+ #currentResponse = $state('');
+
+ // Derived values auto-update
+ get isStreaming() {
+ return $derived(this.#isLoading && this.#currentResponse.length > 0);
+ }
+}
+
+// Exported reactive accessors
+export const isLoading = () => chatStore.isLoading;
+export const currentResponse = () => chatStore.currentResponse;
+```
+
+### 2. Unidirectional Data Flow
+
+Data flows in one direction, making state predictable:
+
+```mermaid
+flowchart LR
+ subgraph UI["UI Layer"]
+ A[User Action] --> B[Component]
+ end
+
+ subgraph State["State Layer"]
+ B --> C[Store Method]
+ C --> D[State Update]
+ end
+
+ subgraph IO["I/O Layer"]
+ C --> E[Service]
+ E --> F[API / IndexedDB]
+ F -.->|Response| D
+ end
+
+ D -->|Reactive| B
+```
+
+Components dispatch actions to stores, stores coordinate with services for I/O, and state updates reactively propagate back to the UI.
+
+### 3. Per-Conversation State
+
+Enables concurrent streaming across multiple conversations:
+
+```typescript
+class ChatStore {
+ chatLoadingStates = new Map<string, boolean>();
+ chatStreamingStates = new Map<string, { response: string; messageId: string }>();
+ abortControllers = new Map<string, AbortController>();
+}
+```
+
+### 4. Message Branching with Tree Structure
+
+Conversations are stored as a tree, not a linear list:
+
+```typescript
+interface DatabaseMessage {
+ id: string;
+ parent: string | null; // Points to parent message
+ children: string[]; // List of child message IDs
+ // ...
+}
+
+interface DatabaseConversation {
+ currentNode: string; // Currently viewed branch tip
+ // ...
+}
+```
+
+Navigation between branches updates `currentNode` without losing history.
+
+### 5. Layered Service Architecture
+
+Stores handle state; services handle I/O:
+
+```text
+┌─────────────────┐
+│ Stores │ Business logic, state management
+├─────────────────┤
+│ Services │ API calls, database operations
+├─────────────────┤
+│ Storage/API │ IndexedDB, LocalStorage, HTTP
+└─────────────────┘
+```
+
+### 6. Server Role Abstraction
+
+Single codebase handles both MODEL and ROUTER modes:
+
+```typescript
+// serverStore.ts
+get isRouterMode() {
+ return this.role === ServerRole.ROUTER;
+}
+
+// Components conditionally render based on mode
+{#if isRouterMode()}
+ <ModelsSelector />
+{/if}
+```
+
+### 7. Modality Validation
+
+Prevents sending attachments to incompatible models:
+
+```typescript
+// useModelChangeValidation hook
+const validate = (modelId: string) => {
+ const modelModalities = modelsStore.getModelModalities(modelId);
+ const conversationModalities = conversationsStore.usedModalities;
+
+ // Check if model supports all used modalities
+ if (conversationModalities.hasImages && !modelModalities.vision) {
+ return { valid: false, reason: 'Model does not support images' };
+ }
+ // ...
+};
+```
+
+### 8. Persistent Storage Strategy
+
+Data is persisted across sessions using two storage mechanisms:
+
+```mermaid
+flowchart TB
+ subgraph Browser["Browser Storage"]
+ subgraph IDB["IndexedDB (Dexie)"]
+ C[Conversations]
+ M[Messages]
+ end
+ subgraph LS["LocalStorage"]
+ S[Settings Config]
+ O[User Overrides]
+ T[Theme Preference]
+ end
+ end
+
+ subgraph Stores["Svelte Stores"]
+ CS[conversationsStore] --> C
+ CS --> M
+ SS[settingsStore] --> S
+ SS --> O
+ SS --> T
+ end
+```
+
+- **IndexedDB**: Conversations and messages (large, structured data)
+- **LocalStorage**: Settings, user parameter overrides, theme (small key-value data)
+- **Memory only**: Server props, model list (fetched fresh on each session)
+
+---
+
+## Testing
+
+### Test Types
+
+| Type | Tool | Location | Command |
+| ------------- | ------------------ | ---------------- | ------------------- |
+| **Unit** | Vitest | `tests/unit/` | `npm run test:unit` |
+| **UI/Visual** | Storybook + Vitest | `tests/stories/` | `npm run test:ui` |
+| **E2E** | Playwright | `tests/e2e/` | `npm run test:e2e` |
+| **Client** | Vitest | `tests/client/`. | `npm run test:unit` |
+
+### Running Tests
+
+```bash
+# All tests
+npm run test
+
+# Individual test suites
+npm run test:e2e # End-to-end (requires llama-server)
+npm run test:client # Client-side unit tests
+npm run test:server # Server-side unit tests
+npm run test:ui # Storybook visual tests
+```
+
+### Storybook Development
+
+```bash
+npm run storybook # Start Storybook dev server on :6006
+npm run build-storybook # Build static Storybook
+```
+
+### Linting and Formatting
+
+```bash
+npm run lint # Check code style
+npm run format # Auto-format with Prettier
+npm run check # TypeScript type checking
+```
+
+---
+
+## Project Structure
+
+```text
+tools/server/webui/
+├── src/
+│ ├── lib/
+│ │ ├── components/ # UI components (app/, ui/)
+│ │ ├── hooks/ # Svelte hooks
+│ │ ├── stores/ # State management
+│ │ ├── services/ # API and database services
+│ │ ├── types/ # TypeScript interfaces
+│ │ └── utils/ # Utility functions
+│ ├── routes/ # SvelteKit routes
+│ └── styles/ # Global styles
+├── static/ # Static assets
+├── tests/ # Test files
+├── docs/ # Architecture diagrams
+│ ├── architecture/ # High-level architecture
+│ └── flows/ # Feature-specific flows
+└── .storybook/ # Storybook configuration
+```
+
+---
+
+## Related Documentation
+
+- [llama.cpp Server README](../README.md) - Full server documentation
+- [Multimodal Documentation](../../../docs/multimodal.md) - Image and audio support
+- [Function Calling](../../../docs/function-calling.md) - Tool use capabilities