llama.cpp/examples/model-conversion/scripts/embedding/modelcard.template


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

---
base_model:
- {base_model}
---
# {model_name} GGUF

Recommended way to run this model:

```sh
llama-server -hf {namespace}/{model_name}-GGUF --embeddings
```

Then the endpoint can be accessed at http://localhost:8080/embedding, for
example using `curl`:
```console
curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{{"input": "Hello embeddings"}}' \
    --silent
```

Alternatively, the `llama-embedding` command line tool can be used:
```sh
llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
```

#### embd_normalize
When a model uses pooling, or the pooling method is specified using `--pooling`,
the normalization can be controlled by the `embd_normalize` parameter.

The default value is `2` which means that the embeddings are normalized using
the Euclidean norm (L2). Other options are:
* -1 No normalization
*  0 Max absolute
*  1 Taxicab
*  2 Euclidean/L2
* \>2 P-Norm

This can be passed in the request body to `llama-server`, for example:
```sh
    --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
```

And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
```sh
llama-embedding -hf {namespace}/{model_name}-GGUF  --embd-normalize -1 -p "Hello embeddings"
```