llm.vim - ollama.md

doc
ollama.md raw
   1# API
   2
   3> Note: Ollama's API docs are moving to https://docs.ollama.com/api
   4
   5## Endpoints
   6
   7- [Generate a completion](#generate-a-completion)
   8- [Generate a chat completion](#generate-a-chat-completion)
   9- [Create a Model](#create-a-model)
  10- [List Local Models](#list-local-models)
  11- [Show Model Information](#show-model-information)
  12- [Copy a Model](#copy-a-model)
  13- [Delete a Model](#delete-a-model)
  14- [Pull a Model](#pull-a-model)
  15- [Push a Model](#push-a-model)
  16- [Generate Embeddings](#generate-embeddings)
  17- [List Running Models](#list-running-models)
  18- [Version](#version)
  19- [Experimental: Image Generation](#image-generation-experimental)
  20
  21## Conventions
  22
  23### Model names
  24
  25Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q8_0` and `llama3:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
  26
  27### Durations
  28
  29All durations are returned in nanoseconds.
  30
  31### Streaming responses
  32
  33Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing `{"stream": false}` for these endpoints.
  34
  35## Generate a completion
  36
  37```
  38POST /api/generate
  39```
  40
  41Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
  42
  43### Parameters
  44
  45- `model`: (required) the [model name](#model-names)
  46- `prompt`: the prompt to generate a response for
  47- `suffix`: the text after the model response
  48- `images`: (optional) a list of base64-encoded images (for multimodal models such as `llava`)
  49- `think`: (for thinking models) should the model think before responding?
  50
  51Advanced parameters (optional):
  52
  53- `format`: the format to return a response in. Format can be `json` or a JSON schema
  54- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.mdx#valid-parameters-and-values) such as `temperature`
  55- `system`: system message to (overrides what is defined in the `Modelfile`)
  56- `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
  57- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
  58- `raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API
  59- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
  60- `context` (deprecated): the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
  61
  62Experimental image generation parameters (for image generation models only):
  63
  64> [!WARNING]
  65> These parameters are experimental and may change in future versions.
  66
  67- `width`: width of the generated image in pixels
  68- `height`: height of the generated image in pixels
  69- `steps`: number of diffusion steps
  70
  71#### Structured outputs
  72
  73Structured outputs are supported by providing a JSON schema in the `format` parameter. The model will generate a response that matches the schema. See the [structured outputs](#request-structured-outputs) example below.
  74
  75#### JSON mode
  76
  77Enable JSON mode by setting the `format` parameter to `json`. This will structure the response as a valid JSON object. See the JSON mode [example](#request-json-mode) below.
  78
  79> [!IMPORTANT]
  80> It's important to instruct the model to use JSON in the `prompt`. Otherwise, the model may generate large amounts whitespace.
  81
  82### Examples
  83
  84#### Generate request (Streaming)
  85
  86##### Request
  87
  88```shell
  89curl http://localhost:11434/api/generate -d '{
  90  "model": "llama3.2",
  91  "prompt": "Why is the sky blue?"
  92}'
  93```
  94
  95##### Response
  96
  97A stream of JSON objects is returned:
  98
  99```json
 100{
 101  "model": "llama3.2",
 102  "created_at": "2023-08-04T08:52:19.385406455-07:00",
 103  "response": "The",
 104  "done": false
 105}
 106```
 107
 108The final response in the stream also includes additional data about the generation:
 109
 110- `total_duration`: time spent generating the response
 111- `load_duration`: time spent in nanoseconds loading the model
 112- `prompt_eval_count`: number of tokens in the prompt
 113- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
 114- `eval_count`: number of tokens in the response
 115- `eval_duration`: time in nanoseconds spent generating the response
 116- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
 117- `response`: empty if the response was streamed, if not streamed, this will contain the full response
 118
 119To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration` \* `10^9`.
 120
 121```json
 122{
 123  "model": "llama3.2",
 124  "created_at": "2023-08-04T19:22:45.499127Z",
 125  "response": "",
 126  "done": true,
 127  "context": [1, 2, 3],
 128  "total_duration": 10706818083,
 129  "load_duration": 6338219291,
 130  "prompt_eval_count": 26,
 131  "prompt_eval_duration": 130079000,
 132  "eval_count": 259,
 133  "eval_duration": 4232710000
 134}
 135```
 136
 137#### Request (No streaming)
 138
 139##### Request
 140
 141A response can be received in one reply when streaming is off.
 142
 143```shell
 144curl http://localhost:11434/api/generate -d '{
 145  "model": "llama3.2",
 146  "prompt": "Why is the sky blue?",
 147  "stream": false
 148}'
 149```
 150
 151##### Response
 152
 153If `stream` is set to `false`, the response will be a single JSON object:
 154
 155```json
 156{
 157  "model": "llama3.2",
 158  "created_at": "2023-08-04T19:22:45.499127Z",
 159  "response": "The sky is blue because it is the color of the sky.",
 160  "done": true,
 161  "context": [1, 2, 3],
 162  "total_duration": 5043500667,
 163  "load_duration": 5025959,
 164  "prompt_eval_count": 26,
 165  "prompt_eval_duration": 325953000,
 166  "eval_count": 290,
 167  "eval_duration": 4709213000
 168}
 169```
 170
 171#### Request (with suffix)
 172
 173##### Request
 174
 175```shell
 176curl http://localhost:11434/api/generate -d '{
 177  "model": "codellama:code",
 178  "prompt": "def compute_gcd(a, b):",
 179  "suffix": "    return result",
 180  "options": {
 181    "temperature": 0
 182  },
 183  "stream": false
 184}'
 185```
 186
 187##### Response
 188
 189```json5
 190{
 191  "model": "codellama:code",
 192  "created_at": "2024-07-22T20:47:51.147561Z",
 193  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",
 194  "done": true,
 195  "done_reason": "stop",
 196  "context": [...],
 197  "total_duration": 1162761250,
 198  "load_duration": 6683708,
 199  "prompt_eval_count": 17,
 200  "prompt_eval_duration": 201222000,
 201  "eval_count": 63,
 202  "eval_duration": 953997000
 203}
 204```
 205
 206#### Request (Structured outputs)
 207
 208##### Request
 209
 210```shell
 211curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
 212  "model": "llama3.1:8b",
 213  "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
 214  "stream": false,
 215  "format": {
 216    "type": "object",
 217    "properties": {
 218      "age": {
 219        "type": "integer"
 220      },
 221      "available": {
 222        "type": "boolean"
 223      }
 224    },
 225    "required": [
 226      "age",
 227      "available"
 228    ]
 229  }
 230}'
 231```
 232
 233##### Response
 234
 235```json
 236{
 237  "model": "llama3.1:8b",
 238  "created_at": "2024-12-06T00:48:09.983619Z",
 239  "response": "{\n  \"age\": 22,\n  \"available\": true\n}",
 240  "done": true,
 241  "done_reason": "stop",
 242  "context": [1, 2, 3],
 243  "total_duration": 1075509083,
 244  "load_duration": 567678166,
 245  "prompt_eval_count": 28,
 246  "prompt_eval_duration": 236000000,
 247  "eval_count": 16,
 248  "eval_duration": 269000000
 249}
 250```
 251
 252#### Request (JSON mode)
 253
 254> [!IMPORTANT]
 255> When `format` is set to `json`, the output will always be a well-formed JSON object. It's important to also instruct the model to respond in JSON.
 256
 257##### Request
 258
 259```shell
 260curl http://localhost:11434/api/generate -d '{
 261  "model": "llama3.2",
 262  "prompt": "What color is the sky at different times of the day? Respond using JSON",
 263  "format": "json",
 264  "stream": false
 265}'
 266```
 267
 268##### Response
 269
 270```json
 271{
 272  "model": "llama3.2",
 273  "created_at": "2023-11-09T21:07:55.186497Z",
 274  "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
 275  "done": true,
 276  "context": [1, 2, 3],
 277  "total_duration": 4648158584,
 278  "load_duration": 4071084,
 279  "prompt_eval_count": 36,
 280  "prompt_eval_duration": 439038000,
 281  "eval_count": 180,
 282  "eval_duration": 4196918000
 283}
 284```
 285
 286The value of `response` will be a string containing JSON similar to:
 287
 288```json
 289{
 290  "morning": {
 291    "color": "blue"
 292  },
 293  "noon": {
 294    "color": "blue-gray"
 295  },
 296  "afternoon": {
 297    "color": "warm gray"
 298  },
 299  "evening": {
 300    "color": "orange"
 301  }
 302}
 303```
 304
 305#### Request (with images)
 306
 307To submit images to multimodal models such as `llava` or `bakllava`, provide a list of base64-encoded `images`:
 308
 309#### Request
 310
 311```shell
 312curl http://localhost:11434/api/generate -d '{
 313  "model": "llava",
 314  "prompt":"What is in this picture?",
 315  "stream": false,
 316  "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
 317}'
 318```
 319
 320#### Response
 321
 322```json
 323{
 324  "model": "llava",
 325  "created_at": "2023-11-03T15:36:02.583064Z",
 326  "response": "A happy cartoon character, which is cute and cheerful.",
 327  "done": true,
 328  "context": [1, 2, 3],
 329  "total_duration": 2938432250,
 330  "load_duration": 2559292,
 331  "prompt_eval_count": 1,
 332  "prompt_eval_duration": 2195557000,
 333  "eval_count": 44,
 334  "eval_duration": 736432000
 335}
 336```
 337
 338#### Request (Raw Mode)
 339
 340In some cases, you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable templating. Also note that raw mode will not return a context.
 341
 342##### Request
 343
 344```shell
 345curl http://localhost:11434/api/generate -d '{
 346  "model": "mistral",
 347  "prompt": "[INST] why is the sky blue? [/INST]",
 348  "raw": true,
 349  "stream": false
 350}'
 351```
 352
 353#### Request (Reproducible outputs)
 354
 355For reproducible outputs, set `seed` to a number:
 356
 357##### Request
 358
 359```shell
 360curl http://localhost:11434/api/generate -d '{
 361  "model": "mistral",
 362  "prompt": "Why is the sky blue?",
 363  "options": {
 364    "seed": 123
 365  }
 366}'
 367```
 368
 369##### Response
 370
 371```json
 372{
 373  "model": "mistral",
 374  "created_at": "2023-11-03T15:36:02.583064Z",
 375  "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
 376  "done": true,
 377  "total_duration": 8493852375,
 378  "load_duration": 6589624375,
 379  "prompt_eval_count": 14,
 380  "prompt_eval_duration": 119039000,
 381  "eval_count": 110,
 382  "eval_duration": 1779061000
 383}
 384```
 385
 386#### Generate request (With options)
 387
 388If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the `options` parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.
 389
 390##### Request
 391
 392```shell
 393curl http://localhost:11434/api/generate -d '{
 394  "model": "llama3.2",
 395  "prompt": "Why is the sky blue?",
 396  "stream": false,
 397  "options": {
 398    "num_keep": 5,
 399    "seed": 42,
 400    "num_predict": 100,
 401    "top_k": 20,
 402    "top_p": 0.9,
 403    "min_p": 0.0,
 404    "typical_p": 0.7,
 405    "repeat_last_n": 33,
 406    "temperature": 0.8,
 407    "repeat_penalty": 1.2,
 408    "presence_penalty": 1.5,
 409    "frequency_penalty": 1.0,
 410    "penalize_newline": true,
 411    "stop": ["\n", "user:"],
 412    "numa": false,
 413    "num_ctx": 1024,
 414    "num_batch": 2,
 415    "num_gpu": 1,
 416    "main_gpu": 0,
 417    "use_mmap": true,
 418    "num_thread": 8
 419  }
 420}'
 421```
 422
 423##### Response
 424
 425```json
 426{
 427  "model": "llama3.2",
 428  "created_at": "2023-08-04T19:22:45.499127Z",
 429  "response": "The sky is blue because it is the color of the sky.",
 430  "done": true,
 431  "context": [1, 2, 3],
 432  "total_duration": 4935886791,
 433  "load_duration": 534986708,
 434  "prompt_eval_count": 26,
 435  "prompt_eval_duration": 107345000,
 436  "eval_count": 237,
 437  "eval_duration": 4289432000
 438}
 439```
 440
 441#### Load a model
 442
 443If an empty prompt is provided, the model will be loaded into memory.
 444
 445##### Request
 446
 447```shell
 448curl http://localhost:11434/api/generate -d '{
 449  "model": "llama3.2"
 450}'
 451```
 452
 453##### Response
 454
 455A single JSON object is returned:
 456
 457```json
 458{
 459  "model": "llama3.2",
 460  "created_at": "2023-12-18T19:52:07.071755Z",
 461  "response": "",
 462  "done": true
 463}
 464```
 465
 466#### Unload a model
 467
 468If an empty prompt is provided and the `keep_alive` parameter is set to `0`, a model will be unloaded from memory.
 469
 470##### Request
 471
 472```shell
 473curl http://localhost:11434/api/generate -d '{
 474  "model": "llama3.2",
 475  "keep_alive": 0
 476}'
 477```
 478
 479##### Response
 480
 481A single JSON object is returned:
 482
 483```json
 484{
 485  "model": "llama3.2",
 486  "created_at": "2024-09-12T03:54:03.516566Z",
 487  "response": "",
 488  "done": true,
 489  "done_reason": "unload"
 490}
 491```
 492
 493## Generate a chat completion
 494
 495```
 496POST /api/chat
 497```
 498
 499Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using `"stream": false`. The final response object will include statistics and additional data from the request.
 500
 501### Parameters
 502
 503- `model`: (required) the [model name](#model-names)
 504- `messages`: the messages of the chat, this can be used to keep a chat memory
 505- `tools`: list of tools in JSON for the model to use if supported
 506- `think`: (for thinking models) should the model think before responding?
 507
 508The `message` object has the following fields:
 509
 510- `role`: the role of the message, either `system`, `user`, `assistant`, or `tool`
 511- `content`: the content of the message
 512- `thinking`: (for thinking models) the model's thinking process
 513- `images` (optional): a list of images to include in the message (for multimodal models such as `llava`)
 514- `tool_calls` (optional): a list of tools in JSON that the model wants to use
 515- `tool_name` (optional): add the name of the tool that was executed to inform the model of the result
 516
 517Advanced parameters (optional):
 518
 519- `format`: the format to return a response in. Format can be `json` or a JSON schema.
 520- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.mdx#valid-parameters-and-values) such as `temperature`
 521- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
 522- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
 523
 524### Tool calling
 525
 526Tool calling is supported by providing a list of tools in the `tools` parameter. The model will generate a response that includes a list of tool calls. See the [Chat request (Streaming with tools)](#chat-request-streaming-with-tools) example below.
 527
 528Models can also explain the result of the tool call in the response. See the [Chat request (With history, with tools)](#chat-request-with-history-with-tools) example below.
 529
 530[See models with tool calling capabilities](https://ollama.com/search?c=tool).
 531
 532### Structured outputs
 533
 534Structured outputs are supported by providing a JSON schema in the `format` parameter. The model will generate a response that matches the schema. See the [Chat request (Structured outputs)](#chat-request-structured-outputs) example below.
 535
 536### Examples
 537
 538#### Chat request (Streaming)
 539
 540##### Request
 541
 542Send a chat message with a streaming response.
 543
 544```shell
 545curl http://localhost:11434/api/chat -d '{
 546  "model": "llama3.2",
 547  "messages": [
 548    {
 549      "role": "user",
 550      "content": "why is the sky blue?"
 551    }
 552  ]
 553}'
 554```
 555
 556##### Response
 557
 558A stream of JSON objects is returned:
 559
 560```json
 561{
 562  "model": "llama3.2",
 563  "created_at": "2023-08-04T08:52:19.385406455-07:00",
 564  "message": {
 565    "role": "assistant",
 566    "content": "The",
 567    "images": null
 568  },
 569  "done": false
 570}
 571```
 572
 573Final response:
 574
 575```json
 576{
 577  "model": "llama3.2",
 578  "created_at": "2023-08-04T19:22:45.499127Z",
 579  "message": {
 580    "role": "assistant",
 581    "content": ""
 582  },
 583  "done": true,
 584  "total_duration": 4883583458,
 585  "load_duration": 1334875,
 586  "prompt_eval_count": 26,
 587  "prompt_eval_duration": 342546000,
 588  "eval_count": 282,
 589  "eval_duration": 4535599000
 590}
 591```
 592
 593#### Chat request (Streaming with tools)
 594
 595##### Request
 596
 597```shell
 598curl http://localhost:11434/api/chat -d '{
 599  "model": "llama3.2",
 600  "messages": [
 601    {
 602      "role": "user",
 603      "content": "what is the weather in tokyo?"
 604    }
 605  ],
 606  "tools": [
 607    {
 608      "type": "function",
 609      "function": {
 610        "name": "get_weather",
 611        "description": "Get the weather in a given city",
 612        "parameters": {
 613          "type": "object",
 614          "properties": {
 615            "city": {
 616              "type": "string",
 617              "description": "The city to get the weather for"
 618            }
 619          },
 620          "required": ["city"]
 621        }
 622      }
 623    }
 624  ],
 625  "stream": true
 626}'
 627```
 628
 629##### Response
 630
 631A stream of JSON objects is returned:
 632
 633```json
 634{
 635  "model": "llama3.2",
 636  "created_at": "2025-07-07T20:22:19.184789Z",
 637  "message": {
 638    "role": "assistant",
 639    "content": "",
 640    "tool_calls": [
 641      {
 642        "function": {
 643          "name": "get_weather",
 644          "arguments": {
 645            "city": "Tokyo"
 646          }
 647        }
 648      }
 649    ]
 650  },
 651  "done": false
 652}
 653```
 654
 655Final response:
 656
 657```json
 658{
 659  "model": "llama3.2",
 660  "created_at": "2025-07-07T20:22:19.19314Z",
 661  "message": {
 662    "role": "assistant",
 663    "content": ""
 664  },
 665  "done_reason": "stop",
 666  "done": true,
 667  "total_duration": 182242375,
 668  "load_duration": 41295167,
 669  "prompt_eval_count": 169,
 670  "prompt_eval_duration": 24573166,
 671  "eval_count": 15,
 672  "eval_duration": 115959084
 673}
 674```
 675
 676#### Chat request (No streaming)
 677
 678##### Request
 679
 680```shell
 681curl http://localhost:11434/api/chat -d '{
 682  "model": "llama3.2",
 683  "messages": [
 684    {
 685      "role": "user",
 686      "content": "why is the sky blue?"
 687    }
 688  ],
 689  "stream": false
 690}'
 691```
 692
 693##### Response
 694
 695```json
 696{
 697  "model": "llama3.2",
 698  "created_at": "2023-12-12T14:13:43.416799Z",
 699  "message": {
 700    "role": "assistant",
 701    "content": "Hello! How are you today?"
 702  },
 703  "done": true,
 704  "total_duration": 5191566416,
 705  "load_duration": 2154458,
 706  "prompt_eval_count": 26,
 707  "prompt_eval_duration": 383809000,
 708  "eval_count": 298,
 709  "eval_duration": 4799921000
 710}
 711```
 712
 713#### Chat request (No streaming, with tools)
 714
 715##### Request
 716
 717```shell
 718curl http://localhost:11434/api/chat -d '{
 719  "model": "llama3.2",
 720  "messages": [
 721    {
 722      "role": "user",
 723      "content": "what is the weather in tokyo?"
 724    }
 725  ],
 726  "tools": [
 727    {
 728      "type": "function",
 729      "function": {
 730        "name": "get_weather",
 731        "description": "Get the weather in a given city",
 732        "parameters": {
 733          "type": "object",
 734          "properties": {
 735            "city": {
 736              "type": "string",
 737              "description": "The city to get the weather for"
 738            }
 739          },
 740          "required": ["city"]
 741        }
 742      }
 743    }
 744  ],
 745  "stream": false
 746}'
 747```
 748
 749##### Response
 750
 751```json
 752{
 753  "model": "llama3.2",
 754  "created_at": "2025-07-07T20:32:53.844124Z",
 755  "message": {
 756    "role": "assistant",
 757    "content": "",
 758    "tool_calls": [
 759      {
 760        "function": {
 761          "name": "get_weather",
 762          "arguments": {
 763            "city": "Tokyo"
 764          }
 765        }
 766      }
 767    ]
 768  },
 769  "done_reason": "stop",
 770  "done": true,
 771  "total_duration": 3244883583,
 772  "load_duration": 2969184542,
 773  "prompt_eval_count": 169,
 774  "prompt_eval_duration": 141656333,
 775  "eval_count": 18,
 776  "eval_duration": 133293625
 777}
 778```
 779
 780#### Chat request (Structured outputs)
 781
 782##### Request
 783
 784```shell
 785curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
 786  "model": "llama3.1",
 787  "messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
 788  "stream": false,
 789  "format": {
 790    "type": "object",
 791    "properties": {
 792      "age": {
 793        "type": "integer"
 794      },
 795      "available": {
 796        "type": "boolean"
 797      }
 798    },
 799    "required": [
 800      "age",
 801      "available"
 802    ]
 803  },
 804  "options": {
 805    "temperature": 0
 806  }
 807}'
 808```
 809
 810##### Response
 811
 812```json
 813{
 814  "model": "llama3.1",
 815  "created_at": "2024-12-06T00:46:58.265747Z",
 816  "message": {
 817    "role": "assistant",
 818    "content": "{\"age\": 22, \"available\": false}"
 819  },
 820  "done_reason": "stop",
 821  "done": true,
 822  "total_duration": 2254970291,
 823  "load_duration": 574751416,
 824  "prompt_eval_count": 34,
 825  "prompt_eval_duration": 1502000000,
 826  "eval_count": 12,
 827  "eval_duration": 175000000
 828}
 829```
 830
 831#### Chat request (With History)
 832
 833Send a chat message with a conversation history. You can use this same approach to start the conversation using multi-shot or chain-of-thought prompting.
 834
 835##### Request
 836
 837```shell
 838curl http://localhost:11434/api/chat -d '{
 839  "model": "llama3.2",
 840  "messages": [
 841    {
 842      "role": "user",
 843      "content": "why is the sky blue?"
 844    },
 845    {
 846      "role": "assistant",
 847      "content": "due to rayleigh scattering."
 848    },
 849    {
 850      "role": "user",
 851      "content": "how is that different than mie scattering?"
 852    }
 853  ]
 854}'
 855```
 856
 857##### Response
 858
 859A stream of JSON objects is returned:
 860
 861```json
 862{
 863  "model": "llama3.2",
 864  "created_at": "2023-08-04T08:52:19.385406455-07:00",
 865  "message": {
 866    "role": "assistant",
 867    "content": "The"
 868  },
 869  "done": false
 870}
 871```
 872
 873Final response:
 874
 875```json
 876{
 877  "model": "llama3.2",
 878  "created_at": "2023-08-04T19:22:45.499127Z",
 879  "done": true,
 880  "total_duration": 8113331500,
 881  "load_duration": 6396458,
 882  "prompt_eval_count": 61,
 883  "prompt_eval_duration": 398801000,
 884  "eval_count": 468,
 885  "eval_duration": 7701267000
 886}
 887```
 888
 889#### Chat request (With history, with tools)
 890
 891##### Request
 892
 893```shell
 894curl http://localhost:11434/api/chat -d '{
 895  "model": "llama3.2",
 896  "messages": [
 897    {
 898      "role": "user",
 899      "content": "what is the weather in Toronto?"
 900    },
 901    // the message from the model appended to history
 902    {
 903      "role": "assistant",
 904      "content": "",
 905      "tool_calls": [
 906        {
 907          "function": {
 908            "name": "get_weather",
 909            "arguments": {
 910              "city": "Toronto"
 911            }
 912          }
 913        }
 914      ]
 915    },
 916    // the tool call result appended to history
 917    {
 918      "role": "tool",
 919      "content": "11 degrees celsius",
 920      "tool_name": "get_weather"
 921    }
 922  ],
 923  "stream": false,
 924  "tools": [
 925    {
 926      "type": "function",
 927      "function": {
 928        "name": "get_weather",
 929        "description": "Get the weather in a given city",
 930        "parameters": {
 931          "type": "object",
 932          "properties": {
 933            "city": {
 934              "type": "string",
 935              "description": "The city to get the weather for"
 936            }
 937          },
 938          "required": ["city"]
 939        }
 940      }
 941    }
 942  ]
 943}'
 944```
 945
 946##### Response
 947
 948```json
 949{
 950  "model": "llama3.2",
 951  "created_at": "2025-07-07T20:43:37.688511Z",
 952  "message": {
 953    "role": "assistant",
 954    "content": "The current temperature in Toronto is 11°C."
 955  },
 956  "done_reason": "stop",
 957  "done": true,
 958  "total_duration": 890771750,
 959  "load_duration": 707634750,
 960  "prompt_eval_count": 94,
 961  "prompt_eval_duration": 91703208,
 962  "eval_count": 11,
 963  "eval_duration": 90282125
 964}
 965```
 966
 967#### Chat request (with images)
 968
 969##### Request
 970
 971Send a chat message with images. The images should be provided as an array, with the individual images encoded in Base64.
 972
 973```shell
 974curl http://localhost:11434/api/chat -d '{
 975  "model": "llava",
 976  "messages": [
 977    {
 978      "role": "user",
 979      "content": "what is in this image?",
 980      "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
 981    }
 982  ]
 983}'
 984```
 985
 986##### Response
 987
 988```json
 989{
 990  "model": "llava",
 991  "created_at": "2023-12-13T22:42:50.203334Z",
 992  "message": {
 993    "role": "assistant",
 994    "content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
 995    "images": null
 996  },
 997  "done": true,
 998  "total_duration": 1668506709,
 999  "load_duration": 1986209,
1000  "prompt_eval_count": 26,
1001  "prompt_eval_duration": 359682000,
1002  "eval_count": 83,
1003  "eval_duration": 1303285000
1004}
1005```
1006
1007#### Chat request (Reproducible outputs)
1008
1009##### Request
1010
1011```shell
1012curl http://localhost:11434/api/chat -d '{
1013  "model": "llama3.2",
1014  "messages": [
1015    {
1016      "role": "user",
1017      "content": "Hello!"
1018    }
1019  ],
1020  "options": {
1021    "seed": 101,
1022    "temperature": 0
1023  }
1024}'
1025```
1026
1027##### Response
1028
1029```json
1030{
1031  "model": "llama3.2",
1032  "created_at": "2023-12-12T14:13:43.416799Z",
1033  "message": {
1034    "role": "assistant",
1035    "content": "Hello! How are you today?"
1036  },
1037  "done": true,
1038  "total_duration": 5191566416,
1039  "load_duration": 2154458,
1040  "prompt_eval_count": 26,
1041  "prompt_eval_duration": 383809000,
1042  "eval_count": 298,
1043  "eval_duration": 4799921000
1044}
1045```
1046
1047#### Chat request (with tools)
1048
1049##### Request
1050
1051```shell
1052curl http://localhost:11434/api/chat -d '{
1053  "model": "llama3.2",
1054  "messages": [
1055    {
1056      "role": "user",
1057      "content": "What is the weather today in Paris?"
1058    }
1059  ],
1060  "stream": false,
1061  "tools": [
1062    {
1063      "type": "function",
1064      "function": {
1065        "name": "get_current_weather",
1066        "description": "Get the current weather for a location",
1067        "parameters": {
1068          "type": "object",
1069          "properties": {
1070            "location": {
1071              "type": "string",
1072              "description": "The location to get the weather for, e.g. San Francisco, CA"
1073            },
1074            "format": {
1075              "type": "string",
1076              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
1077              "enum": ["celsius", "fahrenheit"]
1078            }
1079          },
1080          "required": ["location", "format"]
1081        }
1082      }
1083    }
1084  ]
1085}'
1086```
1087
1088##### Response
1089
1090```json
1091{
1092  "model": "llama3.2",
1093  "created_at": "2024-07-22T20:33:28.123648Z",
1094  "message": {
1095    "role": "assistant",
1096    "content": "",
1097    "tool_calls": [
1098      {
1099        "function": {
1100          "name": "get_current_weather",
1101          "arguments": {
1102            "format": "celsius",
1103            "location": "Paris, FR"
1104          }
1105        }
1106      }
1107    ]
1108  },
1109  "done_reason": "stop",
1110  "done": true,
1111  "total_duration": 885095291,
1112  "load_duration": 3753500,
1113  "prompt_eval_count": 122,
1114  "prompt_eval_duration": 328493000,
1115  "eval_count": 33,
1116  "eval_duration": 552222000
1117}
1118```
1119
1120#### Load a model
1121
1122If the messages array is empty, the model will be loaded into memory.
1123
1124##### Request
1125
1126```shell
1127curl http://localhost:11434/api/chat -d '{
1128  "model": "llama3.2",
1129  "messages": []
1130}'
1131```
1132
1133##### Response
1134
1135```json
1136{
1137  "model": "llama3.2",
1138  "created_at": "2024-09-12T21:17:29.110811Z",
1139  "message": {
1140    "role": "assistant",
1141    "content": ""
1142  },
1143  "done_reason": "load",
1144  "done": true
1145}
1146```
1147
1148#### Unload a model
1149
1150If the messages array is empty and the `keep_alive` parameter is set to `0`, a model will be unloaded from memory.
1151
1152##### Request
1153
1154```shell
1155curl http://localhost:11434/api/chat -d '{
1156  "model": "llama3.2",
1157  "messages": [],
1158  "keep_alive": 0
1159}'
1160```
1161
1162##### Response
1163
1164A single JSON object is returned:
1165
1166```json
1167{
1168  "model": "llama3.2",
1169  "created_at": "2024-09-12T21:33:17.547535Z",
1170  "message": {
1171    "role": "assistant",
1172    "content": ""
1173  },
1174  "done_reason": "unload",
1175  "done": true
1176}
1177```
1178
1179## Create a Model
1180
1181```
1182POST /api/create
1183```
1184
1185Create a model from:
1186
1187- another model;
1188- a safetensors directory; or
1189- a GGUF file.
1190
1191If you are creating a model from a safetensors directory or from a GGUF file, you must [create a blob](#create-a-blob) for each of the files and then use the file name and SHA256 digest associated with each blob in the `files` field.
1192
1193### Parameters
1194
1195- `model`: name of the model to create
1196- `from`: (optional) name of an existing model to create the new model from
1197- `files`: (optional) a dictionary of file names to SHA256 digests of blobs to create the model from
1198- `adapters`: (optional) a dictionary of file names to SHA256 digests of blobs for LORA adapters
1199- `template`: (optional) the prompt template for the model
1200- `license`: (optional) a string or list of strings containing the license or licenses for the model
1201- `system`: (optional) a string containing the system prompt for the model
1202- `parameters`: (optional) a dictionary of parameters for the model (see [Modelfile](./modelfile.mdx#valid-parameters-and-values) for a list of parameters)
1203- `messages`: (optional) a list of message objects used to create a conversation
1204- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
1205- `quantize` (optional): quantize a non-quantized (e.g. float16) model
1206
1207#### Quantization types
1208
1209| Type   | Recommended |
1210| ------ | :---------: |
1211| q4_K_M |     \*      |
1212| q4_K_S |             |
1213| q8_0   |     \*      |
1214
1215### Examples
1216
1217#### Create a new model
1218
1219Create a new model from an existing model.
1220
1221##### Request
1222
1223```shell
1224curl http://localhost:11434/api/create -d '{
1225  "model": "mario",
1226  "from": "llama3.2",
1227  "system": "You are Mario from Super Mario Bros."
1228}'
1229```
1230
1231##### Response
1232
1233A stream of JSON objects is returned:
1234
1235```json
1236{"status":"reading model metadata"}
1237{"status":"creating system layer"}
1238{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
1239{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
1240{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
1241{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
1242{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
1243{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
1244{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
1245{"status":"writing manifest"}
1246{"status":"success"}
1247```
1248
1249#### Quantize a model
1250
1251Quantize a non-quantized model.
1252
1253##### Request
1254
1255```shell
1256curl http://localhost:11434/api/create -d '{
1257  "model": "llama3.2:quantized",
1258  "from": "llama3.2:3b-instruct-fp16",
1259  "quantize": "q4_K_M"
1260}'
1261```
1262
1263##### Response
1264
1265A stream of JSON objects is returned:
1266
1267```json
1268{"status":"quantizing F16 model to Q4_K_M","digest":"0","total":6433687776,"completed":12302}
1269{"status":"quantizing F16 model to Q4_K_M","digest":"0","total":6433687776,"completed":6433687552}
1270{"status":"verifying conversion"}
1271{"status":"creating new layer sha256:fb7f4f211b89c6c4928ff4ddb73db9f9c0cfca3e000c3e40d6cf27ddc6ca72eb"}
1272{"status":"using existing layer sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396"}
1273{"status":"using existing layer sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d"}
1274{"status":"using existing layer sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd"}
1275{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
1276{"status":"writing manifest"}
1277{"status":"success"}
1278```
1279
1280#### Create a model from GGUF
1281
1282Create a model from a GGUF file. The `files` parameter should be filled out with the file name and SHA256 digest of the GGUF file you wish to use. Use [/api/blobs/:digest](#push-a-blob) to push the GGUF file to the server before calling this API.
1283
1284##### Request
1285
1286```shell
1287curl http://localhost:11434/api/create -d '{
1288  "model": "my-gguf-model",
1289  "files": {
1290    "test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
1291  }
1292}'
1293```
1294
1295##### Response
1296
1297A stream of JSON objects is returned:
1298
1299```json
1300{"status":"parsing GGUF"}
1301{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
1302{"status":"writing manifest"}
1303{"status":"success"}
1304```
1305
1306#### Create a model from a Safetensors directory
1307
1308The `files` parameter should include a dictionary of files for the safetensors model which includes the file names and SHA256 digest of each file. Use [/api/blobs/:digest](#push-a-blob) to first push each of the files to the server before calling this API. Files will remain in the cache until the Ollama server is restarted.
1309
1310##### Request
1311
1312```shell
1313curl http://localhost:11434/api/create -d '{
1314  "model": "fred",
1315  "files": {
1316    "config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
1317    "generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
1318    "special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
1319    "tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
1320    "tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
1321    "model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
1322  }
1323}'
1324```
1325
1326##### Response
1327
1328A stream of JSON objects is returned:
1329
1330```shell
1331{"status":"converting model"}
1332{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
1333{"status":"using autodetected template llama3-instruct"}
1334{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
1335{"status":"writing manifest"}
1336{"status":"success"}
1337```
1338
1339## Check if a Blob Exists
1340
1341```shell
1342HEAD /api/blobs/:digest
1343```
1344
1345Ensures that the file blob (Binary Large Object) used with create a model exists on the server. This checks your Ollama server and not ollama.com.
1346
1347### Query Parameters
1348
1349- `digest`: the SHA256 digest of the blob
1350
1351### Examples
1352
1353#### Request
1354
1355```shell
1356curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
1357```
1358
1359#### Response
1360
1361Return 200 OK if the blob exists, 404 Not Found if it does not.
1362
1363## Push a Blob
1364
1365```
1366POST /api/blobs/:digest
1367```
1368
1369Push a file to the Ollama server to create a "blob" (Binary Large Object).
1370
1371### Query Parameters
1372
1373- `digest`: the expected SHA256 digest of the file
1374
1375### Examples
1376
1377#### Request
1378
1379```shell
1380curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
1381```
1382
1383#### Response
1384
1385Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.
1386
1387## List Local Models
1388
1389```
1390GET /api/tags
1391```
1392
1393List models that are available locally.
1394
1395### Examples
1396
1397#### Request
1398
1399```shell
1400curl http://localhost:11434/api/tags
1401```
1402
1403#### Response
1404
1405A single JSON object will be returned.
1406
1407```json
1408{
1409  "models": [
1410    {
1411      "name": "deepseek-r1:latest",
1412      "model": "deepseek-r1:latest",
1413      "modified_at": "2025-05-10T08:06:48.639712648-07:00",
1414      "size": 4683075271,
1415      "digest": "0a8c266910232fd3291e71e5ba1e058cc5af9d411192cf88b6d30e92b6e73163",
1416      "details": {
1417        "parent_model": "",
1418        "format": "gguf",
1419        "family": "qwen2",
1420        "families": ["qwen2"],
1421        "parameter_size": "7.6B",
1422        "quantization_level": "Q4_K_M"
1423      }
1424    },
1425    {
1426      "name": "llama3.2:latest",
1427      "model": "llama3.2:latest",
1428      "modified_at": "2025-05-04T17:37:44.706015396-07:00",
1429      "size": 2019393189,
1430      "digest": "a80c4f17acd55265feec403c7aef86be0c25983ab279d83f3bcd3abbcb5b8b72",
1431      "details": {
1432        "parent_model": "",
1433        "format": "gguf",
1434        "family": "llama",
1435        "families": ["llama"],
1436        "parameter_size": "3.2B",
1437        "quantization_level": "Q4_K_M"
1438      }
1439    }
1440  ]
1441}
1442```
1443
1444## Show Model Information
1445
1446```
1447POST /api/show
1448```
1449
1450Show information about a model including details, modelfile, template, parameters, license, system prompt.
1451
1452### Parameters
1453
1454- `model`: name of the model to show
1455- `verbose`: (optional) if set to `true`, returns full data for verbose response fields
1456
1457### Examples
1458
1459#### Request
1460
1461```shell
1462curl http://localhost:11434/api/show -d '{
1463  "model": "llava"
1464}'
1465```
1466
1467#### Response
1468
1469```json5
1470{
1471  modelfile: '# Modelfile generated by "ollama show"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE """{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: """\nPARAMETER num_ctx 4096\nPARAMETER stop "\u003c/s\u003e"\nPARAMETER stop "USER:"\nPARAMETER stop "ASSISTANT:"',
1472  parameters: 'num_keep                       24\nstop                           "<|start_header_id|>"\nstop                           "<|end_header_id|>"\nstop                           "<|eot_id|>"',
1473  template: "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
1474  details: {
1475    parent_model: "",
1476    format: "gguf",
1477    family: "llama",
1478    families: ["llama"],
1479    parameter_size: "8.0B",
1480    quantization_level: "Q4_0",
1481  },
1482  model_info: {
1483    "general.architecture": "llama",
1484    "general.file_type": 2,
1485    "general.parameter_count": 8030261248,
1486    "general.quantization_version": 2,
1487    "llama.attention.head_count": 32,
1488    "llama.attention.head_count_kv": 8,
1489    "llama.attention.layer_norm_rms_epsilon": 0.00001,
1490    "llama.block_count": 32,
1491    "llama.context_length": 8192,
1492    "llama.embedding_length": 4096,
1493    "llama.feed_forward_length": 14336,
1494    "llama.rope.dimension_count": 128,
1495    "llama.rope.freq_base": 500000,
1496    "llama.vocab_size": 128256,
1497    "tokenizer.ggml.bos_token_id": 128000,
1498    "tokenizer.ggml.eos_token_id": 128009,
1499    "tokenizer.ggml.merges": [], // populates if `verbose=true`
1500    "tokenizer.ggml.model": "gpt2",
1501    "tokenizer.ggml.pre": "llama-bpe",
1502    "tokenizer.ggml.token_type": [], // populates if `verbose=true`
1503    "tokenizer.ggml.tokens": [], // populates if `verbose=true`
1504  },
1505  capabilities: ["completion", "vision"],
1506}
1507```
1508
1509## Copy a Model
1510
1511```
1512POST /api/copy
1513```
1514
1515Copy a model. Creates a model with another name from an existing model.
1516
1517### Examples
1518
1519#### Request
1520
1521```shell
1522curl http://localhost:11434/api/copy -d '{
1523  "source": "llama3.2",
1524  "destination": "llama3-backup"
1525}'
1526```
1527
1528#### Response
1529
1530Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't exist.
1531
1532## Delete a Model
1533
1534```
1535DELETE /api/delete
1536```
1537
1538Delete a model and its data.
1539
1540### Parameters
1541
1542- `model`: model name to delete
1543
1544### Examples
1545
1546#### Request
1547
1548```shell
1549curl -X DELETE http://localhost:11434/api/delete -d '{
1550  "model": "llama3:13b"
1551}'
1552```
1553
1554#### Response
1555
1556Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't exist.
1557
1558## Pull a Model
1559
1560```
1561POST /api/pull
1562```
1563
1564Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
1565
1566### Parameters
1567
1568- `model`: name of the model to pull
1569- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
1570- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
1571
1572### Examples
1573
1574#### Request
1575
1576```shell
1577curl http://localhost:11434/api/pull -d '{
1578  "model": "llama3.2"
1579}'
1580```
1581
1582#### Response
1583
1584If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
1585
1586The first object is the manifest:
1587
1588```json
1589{
1590  "status": "pulling manifest"
1591}
1592```
1593
1594Then there is a series of downloading responses. Until any of the download is completed, the `completed` key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.
1595
1596```json
1597{
1598  "status": "pulling digestname",
1599  "digest": "digestname",
1600  "total": 2142590208,
1601  "completed": 241970
1602}
1603```
1604
1605After all the files are downloaded, the final responses are:
1606
1607```json
1608{
1609    "status": "verifying sha256 digest"
1610}
1611{
1612    "status": "writing manifest"
1613}
1614{
1615    "status": "removing any unused layers"
1616}
1617{
1618    "status": "success"
1619}
1620```
1621
1622if `stream` is set to false, then the response is a single JSON object:
1623
1624```json
1625{
1626  "status": "success"
1627}
1628```
1629
1630## Push a Model
1631
1632```
1633POST /api/push
1634```
1635
1636Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.
1637
1638### Parameters
1639
1640- `model`: name of the model to push in the form of `<namespace>/<model>:<tag>`
1641- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
1642- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
1643
1644### Examples
1645
1646#### Request
1647
1648```shell
1649curl http://localhost:11434/api/push -d '{
1650  "model": "mattw/pygmalion:latest"
1651}'
1652```
1653
1654#### Response
1655
1656If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
1657
1658```json
1659{ "status": "retrieving manifest" }
1660```
1661
1662and then:
1663
1664```json
1665{
1666  "status": "starting upload",
1667  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
1668  "total": 1928429856
1669}
1670```
1671
1672Then there is a series of uploading responses:
1673
1674```json
1675{
1676  "status": "starting upload",
1677  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
1678  "total": 1928429856
1679}
1680```
1681
1682Finally, when the upload is complete:
1683
1684```json
1685{"status":"pushing manifest"}
1686{"status":"success"}
1687```
1688
1689If `stream` is set to `false`, then the response is a single JSON object:
1690
1691```json
1692{ "status": "success" }
1693```
1694
1695## Generate Embeddings
1696
1697```
1698POST /api/embed
1699```
1700
1701Generate embeddings from a model
1702
1703### Parameters
1704
1705- `model`: name of model to generate embeddings from
1706- `input`: text or list of text to generate embeddings for
1707
1708Advanced parameters:
1709
1710- `truncate`: truncates the end of each input to fit within context length. Returns error if `false` and context length is exceeded. Defaults to `true`
1711- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.mdx#valid-parameters-and-values) such as `temperature`
1712- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
1713- `dimensions`: number of dimensions for the embedding
1714
1715### Examples
1716
1717#### Request
1718
1719```shell
1720curl http://localhost:11434/api/embed -d '{
1721  "model": "all-minilm",
1722  "input": "Why is the sky blue?"
1723}'
1724```
1725
1726#### Response
1727
1728```json
1729{
1730  "model": "all-minilm",
1731  "embeddings": [
1732    [
1733      0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
1734      0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
1735    ]
1736  ],
1737  "total_duration": 14143917,
1738  "load_duration": 1019500,
1739  "prompt_eval_count": 8
1740}
1741```
1742
1743#### Request (Multiple input)
1744
1745```shell
1746curl http://localhost:11434/api/embed -d '{
1747  "model": "all-minilm",
1748  "input": ["Why is the sky blue?", "Why is the grass green?"]
1749}'
1750```
1751
1752#### Response
1753
1754```json
1755{
1756  "model": "all-minilm",
1757  "embeddings": [
1758    [
1759      0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
1760      0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
1761    ],
1762    [
1763      -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,
1764      0.017194884, 0.09032035, -0.051705178, 0.09951512, 0.09072481
1765    ]
1766  ]
1767}
1768```
1769
1770## List Running Models
1771
1772```
1773GET /api/ps
1774```
1775
1776List models that are currently loaded into memory.
1777
1778#### Examples
1779
1780### Request
1781
1782```shell
1783curl http://localhost:11434/api/ps
1784```
1785
1786#### Response
1787
1788A single JSON object will be returned.
1789
1790```json
1791{
1792  "models": [
1793    {
1794      "name": "mistral:latest",
1795      "model": "mistral:latest",
1796      "size": 5137025024,
1797      "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
1798      "details": {
1799        "parent_model": "",
1800        "format": "gguf",
1801        "family": "llama",
1802        "families": ["llama"],
1803        "parameter_size": "7.2B",
1804        "quantization_level": "Q4_0"
1805      },
1806      "expires_at": "2024-06-04T14:38:31.83753-07:00",
1807      "size_vram": 5137025024
1808    }
1809  ]
1810}
1811```
1812
1813## Generate Embedding
1814
1815> Note: this endpoint has been superseded by `/api/embed`
1816
1817```
1818POST /api/embeddings
1819```
1820
1821Generate embeddings from a model
1822
1823### Parameters
1824
1825- `model`: name of model to generate embeddings from
1826- `prompt`: text to generate embeddings for
1827
1828Advanced parameters:
1829
1830- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.mdx#valid-parameters-and-values) such as `temperature`
1831- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
1832
1833### Examples
1834
1835#### Request
1836
1837```shell
1838curl http://localhost:11434/api/embeddings -d '{
1839  "model": "all-minilm",
1840  "prompt": "Here is an article about llamas..."
1841}'
1842```
1843
1844#### Response
1845
1846```json
1847{
1848  "embedding": [
1849    0.5670403838157654, 0.009260174818336964, 0.23178744316101074,
1850    -0.2916173040866852, -0.8924556970596313, 0.8785552978515625,
1851    -0.34576427936553955, 0.5742510557174683, -0.04222835972905159,
1852    -0.137906014919281
1853  ]
1854}
1855```
1856
1857## Version
1858
1859```
1860GET /api/version
1861```
1862
1863Retrieve the Ollama version
1864
1865### Examples
1866
1867#### Request
1868
1869```shell
1870curl http://localhost:11434/api/version
1871```
1872
1873#### Response
1874
1875```json
1876{
1877  "version": "0.5.1"
1878}
1879```
1880
1881## Experimental Features
1882
1883### Image Generation (Experimental)
1884
1885> [!WARNING]
1886> Image generation is experimental and may change in future versions.
1887
1888Image generation is now supported through the standard `/api/generate` endpoint when using image generation models. The API automatically detects when an image generation model is being used.
1889
1890See the [Generate a completion](#generate-a-completion) section for the full API documentation. The experimental image generation parameters (`width`, `height`, `steps`) are documented there.
1891
1892#### Example
1893
1894##### Request
1895
1896```shell
1897curl http://localhost:11434/api/generate -d '{
1898  "model": "x/z-image-turbo",
1899  "prompt": "a sunset over mountains",
1900  "width": 1024,
1901  "height": 768
1902}'
1903```
1904
1905##### Response (streaming)
1906
1907Progress updates during generation:
1908
1909```json
1910{
1911  "model": "x/z-image-turbo",
1912  "created_at": "2024-01-15T10:30:00.000000Z",
1913  "completed": 5,
1914  "total": 20,
1915  "done": false
1916}
1917```
1918
1919##### Final Response
1920
1921```json
1922{
1923  "model": "x/z-image-turbo",
1924  "created_at": "2024-01-15T10:30:15.000000Z",
1925  "image": "iVBORw0KGgoAAAANSUhEUg...",
1926  "done": true,
1927  "done_reason": "stop",
1928  "total_duration": 15000000000,
1929  "load_duration": 2000000000
1930}
1931```