llmnpc - llama.cpp/examples/model-conversion/scripts/embedding/modelcard.template

Path: llmnpc / llama.cpp / examples / model-conversion / scripts / embedding / modelcard.template (raw)

 1---
 2base_model:
 3- {base_model}
 4---
 5# {model_name} GGUF
 6
 7Recommended way to run this model:
 8
 9```sh
10llama-server -hf {namespace}/{model_name}-GGUF --embeddings
11```
12
13Then the endpoint can be accessed at http://localhost:8080/embedding, for
14example using `curl`:
15```console
16curl --request POST \
17    --url http://localhost:8080/embedding \
18    --header "Content-Type: application/json" \
19    --data '{{"input": "Hello embeddings"}}' \
20    --silent
21```
22
23Alternatively, the `llama-embedding` command line tool can be used:
24```sh
25llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
26```
27
28#### embd_normalize
29When a model uses pooling, or the pooling method is specified using `--pooling`,
30the normalization can be controlled by the `embd_normalize` parameter.
31
32The default value is `2` which means that the embeddings are normalized using
33the Euclidean norm (L2). Other options are:
34* -1 No normalization
35*  0 Max absolute
36*  1 Taxicab
37*  2 Euclidean/L2
38* \>2 P-Norm
39
40This can be passed in the request body to `llama-server`, for example:
41```sh
42    --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
43```
44
45And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
46```sh
47llama-embedding -hf {namespace}/{model_name}-GGUF  --embd-normalize -1 -p "Hello embeddings"
48```