llmnpc - llama.cpp/examples/eval-callback/README.md

Path: llmnpc / llama.cpp / examples / eval-callback / README.md (raw)
 1# llama.cpp/examples/eval-callback
 2
 3A simple example which demonstrates how to use callback during the inference.
 4It simply prints to the console all operations and tensor data.
 5
 6Usage:
 7
 8```shell
 9llama-eval-callback \
10  --hf-repo ggml-org/models \
11  --hf-file phi-2/ggml-model-q4_0.gguf \
12  --model phi-2-q4_0.gguf \
13  --prompt hello \
14  --seed 42 \
15  -ngl 33
16```
17
18Will print:
19
20```shell
21llm_load_tensors: offloaded 33/33 layers to GPU
22...
23llama_new_context_with_model: n_ctx      = 512
24...
25llama_new_context_with_model:      CUDA0 compute buffer size =   105.00 MiB
26llama_new_context_with_model:  CUDA_Host compute buffer size =     6.01 MiB
27llama_new_context_with_model: graph nodes  = 1225
28llama_new_context_with_model: graph splits = 2
29ggml_debug:                 inp_embd = (f32)   GET_ROWS(token_embd.weight{2560, 51200, 1, 1}, inp_tokens{1, 1, 1, 1}}) = {2560, 1, 1, 1}
30                                     [
31                                      [
32                                       [ -0.0181,   0.0272,   0.0272, ...],
33                                      ],
34                                     ]
35ggml_debug:                   norm-0 = (f32)       NORM(CUDA0#inp_embd#0{2560, 1, 1, 1}, }) = {2560, 1, 1, 1}
36                                     [
37                                      [
38                                       [ -0.6989,   1.0636,   1.0636, ...],
39                                      ],
40                                     ]
41ggml_debug:                 norm_w-0 = (f32)        MUL(norm-0{2560, 1, 1, 1}, blk.0.attn_norm.weight{2560, 1, 1, 1}}) = {2560, 1, 1, 1}
42                                     [
43                                      [
44                                       [ -0.1800,   0.2817,   0.2632, ...],
45                                      ],
46                                     ]
47ggml_debug:              attn_norm-0 = (f32)        ADD(norm_w-0{2560, 1, 1, 1}, blk.0.attn_norm.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1}
48                                     [
49                                      [
50                                       [ -0.1863,   0.2970,   0.2604, ...],
51                                      ],
52                                     ]
53ggml_debug:                   wqkv-0 = (f32)    MUL_MAT(blk.0.attn_qkv.weight{2560, 7680, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {7680, 1, 1, 1}
54                                     [
55                                      [
56                                       [ -1.1238,   1.2876,  -1.8086, ...],
57                                      ],
58                                     ]
59ggml_debug:                   bqkv-0 = (f32)        ADD(wqkv-0{7680, 1, 1, 1}, blk.0.attn_qkv.bias{7680, 1, 1, 1}}) = {7680, 1, 1, 1}
60                                     [
61                                      [
62                                       [ -1.1135,   1.4604,  -1.9226, ...],
63                                      ],
64                                     ]
65ggml_debug:            bqkv-0 (view) = (f32)       VIEW(bqkv-0{7680, 1, 1, 1}, }) = {2560, 1, 1, 1}
66                                     [
67                                      [
68                                       [ -1.1135,   1.4604,  -1.9226, ...],
69                                      ],
70                                     ]
71ggml_debug:                   Qcur-0 = (f32)       CONT(bqkv-0 (view){2560, 1, 1, 1}, }) = {2560, 1, 1, 1}
72                                     [
73                                      [
74                                       [ -1.1135,   1.4604,  -1.9226, ...],
75                                      ],
76                                     ]
77ggml_debug:        Qcur-0 (reshaped) = (f32)    RESHAPE(Qcur-0{2560, 1, 1, 1}, }) = {80, 32, 1, 1}
78                                     [
79                                      [
80                                       [ -1.1135,   1.4604,  -1.9226, ...],
81                                       [ -0.3608,   0.5076,  -1.8866, ...],
82                                       [  1.7643,   0.0273,  -2.1065, ...],
83                                       ...
84                                      ],
85                                     ]
86ggml_debug:                   Qcur-0 = (f32)       ROPE(Qcur-0 (reshaped){80, 32, 1, 1}, CUDA0#inp_pos#0{1, 1, 1, 1}}) = {80, 32, 1, 1}
87                                     [
88                                      [
89                                       [ -1.1135,   1.4604,  -1.9226, ...],
90                                       [ -0.3608,   0.5076,  -1.8866, ...],
91                                       [  1.7643,   0.0273,  -2.1065, ...],
92                                       ...
93                                      ],
94                                     ]
95```