llmnpc - llama.cpp/docs/backend/VirtGPU.md

Path: llmnpc / llama.cpp / docs / backend / VirtGPU.md (raw)
  1# GGML-VirtGPU Backend
  2
  3The GGML-VirtGPU backend enables GGML applications to run machine
  4learning computations on host hardware while the application itself
  5runs inside a virtual machine.  It uses host-guest shared memory to
  6efficiently share data buffers between the two sides.
  7
  8This backend relies on the virtio-gpu, and VirglRenderer API Remoting
  9(APIR) component. The backend is split into two libraries:
 10- a GGML implementation (the "remoting frontend"), running in the
 11  guest and interacting with the virtgpu device
 12- a VirglRenderer APIR compatible library (the "remoting backend"),
 13  running in the host and interacting with Virglrenderer and an actual
 14  GGML device backend.
 15
 16## OS support
 17
 18| OS       | Status            | Backend     | CI testing  | Notes
 19| -------- | ----------------- | ----------- | ----------- | -----
 20| MacOS 14 | Supported         | ggml-metal  | X           | Working when compiled on MacOS 14
 21| MacOS 15 | Supported         | ggml-metal  | X           | Working when compiled on MacOS 14 or MacOS 15
 22| MacOS 26 | Not tested        |             |             |
 23| Linux    | Under development | ggml-vulkan | not working | Working locally, CI running into deadlocks
 24
 25
 26## Architecture Overview
 27
 28The GGML-VirtGPU backend consists of three main components:
 29
 30```mermaid
 31graph TD
 32    %% Nodes
 33
 34 subgraph GuestVM ["Guest VM - Frontend"]
 35        App([GGML Application<br/>llama.cpp, etc.])
 36
 37        direction TB
 38        Interface[GGML Backend Interface]
 39        Comm["GGML-VirtGPU<br/>(hypercalls + shared mem)"]
 40
 41        App --> Interface
 42        Interface --> Comm
 43    end
 44
 45    API[virtio-gpu / virglrenderer API]
 46
 47    subgraph HostSystem [Host System - Backend]
 48        direction TB
 49        Dispatcher[GGML-VirtGPU-Backend]
 50        BackendLib[GGML Backend library<br/>Metal / Vulkan / CPU / ...]
 51
 52        Dispatcher --> BackendLib
 53    end
 54
 55    %% Connections
 56    Comm --> API
 57    API --> HostSystem
 58```
 59
 60### Key Components
 61
 621. **Guest-side Frontend** (`ggml-virtgpu/`): Implements the GGML backend interface and forwards operations to the host
 632. **Host-side Backend** (`ggml-virtgpu/backend/`): Receives forwarded operations and executes them on actual hardware backends
 643. **Communication Layer**: Uses virtio-gpu hypercalls and shared memory for efficient data transfer
 65
 66## Features
 67
 68- **Dynamic backend loading** on the host side (CPU, CUDA, Metal, etc.)
 69- **Zero-copy data transfer** via host-guest shared memory pages
 70
 71## Communication Protocol
 72
 73### Hypercalls and Shared Memory
 74
 75The backend uses two primary communication mechanisms:
 76
 771. **Hypercalls (`DRM_IOCTL_VIRTGPU_EXECBUFFER`)**: Trigger remote execution from guest to host
 782. **Shared Memory Pages**: Zero-copy data transfer for tensors and parameters
 79
 80#### Shared Memory Layout
 81
 82Each connection uses two shared memory buffers:
 83
 84- **Data Buffer** (24 MiB): For command/response data and tensor transfers
 85- **Reply Buffer** (16 KiB): For command replies and status information
 86- **Data Buffers**: Dynamically allocated host-guest shared buffers
 87  served as GGML buffers.
 88
 89### APIR Protocol
 90
 91The Virglrender API Remoting protocol defines three command types:
 92
 93- `HANDSHAKE`: Protocol version negotiation and capability discovery
 94- `LOADLIBRARY`: Dynamic loading of backend libraries on the host
 95- `FORWARD`: API function call forwarding
 96
 97### Binary Serialization
 98
 99Commands and data are serialized using a custom binary protocol with:
100
101- Fixed-size encoding for basic types
102- Variable-length arrays with size prefixes
103- Buffer bounds checking
104- Error recovery mechanisms
105
106## Supported Operations
107
108### Device Operations
109- Device enumeration and capability queries
110- Memory information (total/free)
111- Backend type detection
112
113### Buffer Operations
114- Buffer allocation and deallocation
115- Tensor data transfer (host ↔ guest)
116- Memory copying and clearing
117
118### Computation Operations
119- Graph execution forwarding
120
121## Build Requirements
122
123### Guest-side Dependencies
124- `libdrm` for DRM/virtio-gpu communication
125- C++20 compatible compiler
126- CMake 3.14+
127
128### Host-side Dependencies
129- virglrenderer with APIR support (pending upstream review)
130- Target backend libraries (libggml-metal, libggml-vulkan, etc.)
131
132## Configuration
133
134### Environment Variables
135
136- `GGML_VIRTGPU_BACKEND_LIBRARY`: Path to the host-side backend library
137- `GGML_VIRTGPU_DEBUG`: Enable debug logging
138
139### Build Options
140
141- `GGML_VIRTGPU`: Enable the VirtGPU backend (`ON` or `OFF`, default: `OFF`)
142- `GGML_VIRTGPU_BACKEND`: Build the host-side backend component (`ON`, `OFF` or `ONLY`, default: `OFF`)
143
144### System Requirements
145
146- VM with virtio-gpu support
147- VirglRenderer with APIR patches
148- Compatible backend libraries on host
149
150## Limitations
151
152- **VM-specific**: Only works in virtual machines with virtio-gpu support
153- **Host dependency**: Requires properly configured host-side backend
154- **Latency**: Small overhead from VM escaping for each operation
155
156
157* This work is pending upstream changes in the VirglRenderer
158  project.
159  * The backend can be tested with Virglrenderer compiled from source
160  using this PR:
161  https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590
162* This work is pending changes in the VMM/hypervisor running the
163  virtual machine, which need to know how to route the newly
164  introduced APIR capset.
165  * The environment variable `VIRGL_ROUTE_VENUS_TO_APIR=1` allows
166    using the Venus capset, until the relevant hypervisors have been
167    patched. However, setting this flag breaks the Vulkan/Venus normal
168    behavior.
169  * The environment variable `GGML_REMOTING_USE_APIR_CAPSET` tells the
170    `ggml-virtgpu` backend to use the APIR capset. This will become
171    the default when the relevant hypervisors have been patched.
172
173* This work focused on improving the performance of llama.cpp running
174  on MacOS containers, and is mainly tested on this platform. The
175  linux support (via `krun`) is in progress.
176
177## See Also
178
179- [Development and Testing](VirtGPU/development.md)
180- [Backend configuration](VirtGPU/configuration.md)