llmnpc - llama.cpp/docs/multimodal/gemma3.md

Path: llmnpc / llama.cpp / docs / multimodal / gemma3.md (raw)
 1# Gemma 3 vision
 2
 3> [!IMPORTANT]
 4>
 5> This is very experimental, only used for demo purpose.
 6
 7## Quick started
 8
 9You can use pre-quantized model from [ggml-org](https://huggingface.co/ggml-org)'s Hugging Face account
10
11```bash
12# build
13cmake -B build
14cmake --build build --target llama-mtmd-cli
15
16# alternatively, install from brew (MacOS)
17brew install llama.cpp
18
19# run it
20llama-mtmd-cli -hf ggml-org/gemma-3-4b-it-GGUF
21llama-mtmd-cli -hf ggml-org/gemma-3-12b-it-GGUF
22llama-mtmd-cli -hf ggml-org/gemma-3-27b-it-GGUF
23
24# note: 1B model does not support vision
25```
26
27## How to get mmproj.gguf?
28
29Simply to add `--mmproj` in when converting model via `convert_hf_to_gguf.py`:
30
31```bash
32cd gemma-3-4b-it
33python ../llama.cpp/convert_hf_to_gguf.py --outfile model.gguf --outtype f16 --mmproj .
34# output file: mmproj-model.gguf
35```
36
37## How to run it?
38
39What you need:
40- The text model GGUF, can be converted using `convert_hf_to_gguf.py`
41- The mmproj file from step above
42- An image file
43
44```bash
45# build
46cmake -B build
47cmake --build build --target llama-mtmd-cli
48
49# run it
50./build/bin/llama-mtmd-cli -m {text_model}.gguf --mmproj mmproj.gguf --image your_image.jpg
51```