1
  2# llama-gguf-hash
  3
  4CLI to hash GGUF files to detect difference on a per model and per tensor level.
  5
  6**Command line options:**
  7
  8- `--help`: display help message
  9- `--xxh64`: use xhash 64bit hash mode (default)
 10- `--sha1`: use sha1
 11- `--uuid`: use uuid
 12- `--sha256`: use sha256
 13- `--all`: use all hash
 14- `--no-layer`: exclude per layer hash
 15- `--uuid`: generate UUIDv5 ID
 16- `-c`, `--check <manifest>`:  verify against a manifest
 17
 18## About
 19
 20While most POSIX systems already have hash checking programs like sha256sum, it
 21is designed to check entire files. This is not ideal for our purpose if we want
 22to check for consistency of the tensor data even if the metadata content of the
 23gguf KV store has been updated.
 24
 25This program is designed to hash a gguf tensor payload on a 'per tensor layer'
 26in addition to a 'entire tensor model' hash. The intent is that the entire
 27tensor layer can be checked first but if there is any detected inconsistencies,
 28then the per tensor hash can be used to narrow down the specific tensor layer
 29that has inconsistencies.
 30
 31For Maintainers:
 32- Detection of tensor inconsistency during development and automated tests
 33    - This is served by xxh64 which is fast
 34    - This is also served by having per tensor layer to assist in narrowing down
 35      the location of the faulty tensor layer
 36    - This is also served by sha1 which is much slower but more widely supported
 37
 38For Model Creators:
 39- Optional consistent UUID generation based on model tensor content
 40    - This is served by UUIDv5 which is useful for databases keys
 41        - llama.cpp UUIDv5 Namespace: `ef001206-dadc-5f6d-a15f-3359e577d4e5`
 42            - Made via UUIDv5 URL namespace of `en.wikipedia.org/wiki/Llama.cpp`
 43
 44For Model Users:
 45- Assurance of tensor layer integrity even if metadata was updated
 46    - This is served by sha256 which is still considered very secure as of 2024
 47
 48### Design Note
 49
 50- The default behavior of this program if no arguments is provided is to hash
 51  using xxhash's xxh32 mode because it is very fast and is primarily targeted
 52  towards maintainers who may want to use this in automated tests.
 53- xxhash support xxh32 and xxh128 for 32bit hash and 128bit hash respectively
 54  however we picked 64bit xxhash as most computers are 64bit as of 2024 and thus
 55  would have a better affinity to calculating hash that is 64bit in size.
 56
 57## Compile Example
 58
 59```bash
 60cmake -B build -DCMAKE_BUILD_TYPE=Debug -DLLAMA_FATAL_WARNINGS=ON
 61make -C build clean
 62make -C build llama-gguf-hash VERBOSE=1
 63./build/bin/llama-gguf-hash test.gguf
 64./build/bin/llama-gguf-hash --xxh64 test.gguf
 65./build/bin/llama-gguf-hash --sha1 test.gguf
 66./build/bin/llama-gguf-hash --uuid test.gguf
 67./build/bin/llama-gguf-hash --sha256 test.gguf
 68```
 69
 70## Generation and Verification Example
 71
 72To generate we may use this command
 73
 74```bash
 75./llama-gguf-hash --all test.gguf > test.gguf.manifest
 76```
 77
 78Which would generate a manifest that looks like below, which contains multiple hash type and per tensor layer hashes as well
 79(This excludes UUID as that is an ID not a hash)
 80
 81```bash
 82xxh64     f66e9cd66a4396a0  test.gguf:tensor_0
 83sha1      59f79ecefd8125a996fdf419239051a7e99e5f20  test.gguf:tensor_0
 84sha256    c0510d38fa060c46265e0160a85c7243096b01dd31c2f355bdbb5516b20de1bd  test.gguf:tensor_0
 85xxh64     7d3a1f9ac04d0537  test.gguf:tensor_1
 86sha1      4765f592eacf096df4628ba59476af94d767080a  test.gguf:tensor_1
 87sha256    8514cbcc73692a2c56bd7a33a022edd5ff819614bd23b19915d7224387f397a7  test.gguf:tensor_1
 88xxh64     a0af5d700049693b  test.gguf:tensor_2
 89sha1      25cbfbad4513cc348e2c95ebdee69d6ff2fd8753  test.gguf:tensor_2
 90sha256    947e6b36e20f2cc95e1d2ce1c1669d813d574657ac6b5ac5196158d454d35180  test.gguf:tensor_2
 91xxh64     e83fddf559d7b6a6  test.gguf:tensor_3
 92sha1      a9cba73e2d90f2ee3dae2548caa42bef3fe6a96c  test.gguf:tensor_3
 93sha256    423b044e016d8ac73c39f23f60bf01bedef5ecb03c0230accd824c91fe86f1a1  test.gguf:tensor_3
 94xxh64     1257733306b7992d  test.gguf:tensor_4
 95sha1      d7bc61db93bb685ce9d598da89717c66729b7543  test.gguf:tensor_4
 96sha256    79737cb3912d4201384cf7f16a1a37ff7823f23ea796cb205b6ca361ab9e3ebf  test.gguf:tensor_4
 97xxh64     d238d16ba4711e58  test.gguf:tensor_5
 98sha1      0706566c198fe1072f37e0a5135b4b5f23654c52  test.gguf:tensor_5
 99sha256    60949be8298eced0ecdde64487643d018407bd261691e061d9e9c3dbc9fd358b  test.gguf:tensor_5
100xxh64     3fbc3b65ab8c7f39  test.gguf:tensor_6
101sha1      73922a0727226a409049f6fc3172a52219ca6f00  test.gguf:tensor_6
102sha256    574f4c46ff384a3b9a225eb955d2a871847a2e8b3fa59387a8252832e92ef7b0  test.gguf:tensor_6
103xxh64     c22021c29854f093  test.gguf:tensor_7
104sha1      efc39cece6a951188fc41e354c73bbfe6813d447  test.gguf:tensor_7
105sha256    4c0410cd3c500f078ae5b21e8dc9eb79e29112713b2ab58a882f82a3868d4d75  test.gguf:tensor_7
106xxh64     936df61f5d64261f  test.gguf:tensor_8
107sha1      c2490296d789a4f34398a337fed8377d943d9f06  test.gguf:tensor_8
108sha256    c4401313feeba0261275c3b25bd2d8fe40ce04e0f440c2980ed0e9674c30ff01  test.gguf:tensor_8
109xxh64     93fd20c64421c081  test.gguf:tensor_9
110sha1      7047ce1e78437a6884337a3751c7ee0421918a65  test.gguf:tensor_9
111sha256    23d57cf0d7a6e90b0b3616b41300e0cd354781e812add854a5f95aa55f2bc514  test.gguf:tensor_9
112xxh64     5a54d3aad816f302  test.gguf
113sha1      d15be52c4ff213e823cb6dd13af7ee2f978e7042  test.gguf
114sha256    7dd641b32f59b60dbd4b5420c4b0f6321ccf48f58f6ae201a3dbc4a58a27c6e4  test.gguf
115```
116
117We can then use the normal check command which will by default check for the highest security strength hash and verify against that:
118
119```bash
120$ ./llama-gguf-hash --check test.gguf.manifest test.gguf
121manifest  test.gguf.manifest  sha256  sha1  xxh64
122sha256    c0510d38fa060c46265e0160a85c7243096b01dd31c2f355bdbb5516b20de1bd  test.gguf:tensor_0  -  Ok
123sha256    8514cbcc73692a2c56bd7a33a022edd5ff819614bd23b19915d7224387f397a7  test.gguf:tensor_1  -  Ok
124sha256    947e6b36e20f2cc95e1d2ce1c1669d813d574657ac6b5ac5196158d454d35180  test.gguf:tensor_2  -  Ok
125sha256    423b044e016d8ac73c39f23f60bf01bedef5ecb03c0230accd824c91fe86f1a1  test.gguf:tensor_3  -  Ok
126sha256    79737cb3912d4201384cf7f16a1a37ff7823f23ea796cb205b6ca361ab9e3ebf  test.gguf:tensor_4  -  Ok
127sha256    60949be8298eced0ecdde64487643d018407bd261691e061d9e9c3dbc9fd358b  test.gguf:tensor_5  -  Ok
128sha256    574f4c46ff384a3b9a225eb955d2a871847a2e8b3fa59387a8252832e92ef7b0  test.gguf:tensor_6  -  Ok
129sha256    4c0410cd3c500f078ae5b21e8dc9eb79e29112713b2ab58a882f82a3868d4d75  test.gguf:tensor_7  -  Ok
130sha256    c4401313feeba0261275c3b25bd2d8fe40ce04e0f440c2980ed0e9674c30ff01  test.gguf:tensor_8  -  Ok
131sha256    23d57cf0d7a6e90b0b3616b41300e0cd354781e812add854a5f95aa55f2bc514  test.gguf:tensor_9  -  Ok
132sha256    7dd641b32f59b60dbd4b5420c4b0f6321ccf48f58f6ae201a3dbc4a58a27c6e4  test.gguf  -  Ok
133
134Verification results for test.gguf.manifest - Success
135```
136
137Or we may explicitly ask for a faster hash like:
138
139```bash
140$ ./llama-gguf-hash --check test.gguf.manifest --xxh64 test.gguf
141manifest  test.gguf.manifest  sha256  sha1  xxh64
142xxh64     f66e9cd66a4396a0  test.gguf:tensor_0  -  Ok
143xxh64     7d3a1f9ac04d0537  test.gguf:tensor_1  -  Ok
144xxh64     a0af5d700049693b  test.gguf:tensor_2  -  Ok
145xxh64     e83fddf559d7b6a6  test.gguf:tensor_3  -  Ok
146xxh64     1257733306b7992d  test.gguf:tensor_4  -  Ok
147xxh64     d238d16ba4711e58  test.gguf:tensor_5  -  Ok
148xxh64     3fbc3b65ab8c7f39  test.gguf:tensor_6  -  Ok
149xxh64     c22021c29854f093  test.gguf:tensor_7  -  Ok
150xxh64     936df61f5d64261f  test.gguf:tensor_8  -  Ok
151xxh64     93fd20c64421c081  test.gguf:tensor_9  -  Ok
152xxh64     5a54d3aad816f302  test.gguf  -  Ok
153
154Verification results for test.gguf.manifest - Success
155```
156
157Or maybe we want to just check that all the hash is valid:
158
159```bash
160$./llama-gguf-hash --check test.gguf.manifest --all test.gguf.manifest
161manifest  test.gguf.manifest  sha256  sha1  xxh64
162xxh64     f66e9cd66a4396a0  test.gguf:tensor_0  -  Ok
163sha1      59f79ecefd8125a996fdf419239051a7e99e5f20  test.gguf:tensor_0  -  Ok
164sha256    c0510d38fa060c46265e0160a85c7243096b01dd31c2f355bdbb5516b20de1bd  test.gguf:tensor_0  -  Ok
165xxh64     7d3a1f9ac04d0537  test.gguf:tensor_1  -  Ok
166sha1      4765f592eacf096df4628ba59476af94d767080a  test.gguf:tensor_1  -  Ok
167sha256    8514cbcc73692a2c56bd7a33a022edd5ff819614bd23b19915d7224387f397a7  test.gguf:tensor_1  -  Ok
168xxh64     a0af5d700049693b  test.gguf:tensor_2  -  Ok
169sha1      25cbfbad4513cc348e2c95ebdee69d6ff2fd8753  test.gguf:tensor_2  -  Ok
170sha256    947e6b36e20f2cc95e1d2ce1c1669d813d574657ac6b5ac5196158d454d35180  test.gguf:tensor_2  -  Ok
171xxh64     e83fddf559d7b6a6  test.gguf:tensor_3  -  Ok
172sha1      a9cba73e2d90f2ee3dae2548caa42bef3fe6a96c  test.gguf:tensor_3  -  Ok
173sha256    423b044e016d8ac73c39f23f60bf01bedef5ecb03c0230accd824c91fe86f1a1  test.gguf:tensor_3  -  Ok
174xxh64     1257733306b7992d  test.gguf:tensor_4  -  Ok
175sha1      d7bc61db93bb685ce9d598da89717c66729b7543  test.gguf:tensor_4  -  Ok
176sha256    79737cb3912d4201384cf7f16a1a37ff7823f23ea796cb205b6ca361ab9e3ebf  test.gguf:tensor_4  -  Ok
177xxh64     d238d16ba4711e58  test.gguf:tensor_5  -  Ok
178sha1      0706566c198fe1072f37e0a5135b4b5f23654c52  test.gguf:tensor_5  -  Ok
179sha256    60949be8298eced0ecdde64487643d018407bd261691e061d9e9c3dbc9fd358b  test.gguf:tensor_5  -  Ok
180xxh64     3fbc3b65ab8c7f39  test.gguf:tensor_6  -  Ok
181sha1      73922a0727226a409049f6fc3172a52219ca6f00  test.gguf:tensor_6  -  Ok
182sha256    574f4c46ff384a3b9a225eb955d2a871847a2e8b3fa59387a8252832e92ef7b0  test.gguf:tensor_6  -  Ok
183xxh64     c22021c29854f093  test.gguf:tensor_7  -  Ok
184sha1      efc39cece6a951188fc41e354c73bbfe6813d447  test.gguf:tensor_7  -  Ok
185sha256    4c0410cd3c500f078ae5b21e8dc9eb79e29112713b2ab58a882f82a3868d4d75  test.gguf:tensor_7  -  Ok
186xxh64     936df61f5d64261f  test.gguf:tensor_8  -  Ok
187sha1      c2490296d789a4f34398a337fed8377d943d9f06  test.gguf:tensor_8  -  Ok
188sha256    c4401313feeba0261275c3b25bd2d8fe40ce04e0f440c2980ed0e9674c30ff01  test.gguf:tensor_8  -  Ok
189xxh64     93fd20c64421c081  test.gguf:tensor_9  -  Ok
190sha1      7047ce1e78437a6884337a3751c7ee0421918a65  test.gguf:tensor_9  -  Ok
191sha256    23d57cf0d7a6e90b0b3616b41300e0cd354781e812add854a5f95aa55f2bc514  test.gguf:tensor_9  -  Ok
192xxh64     5a54d3aad816f302  test.gguf  -  Ok
193sha1      d15be52c4ff213e823cb6dd13af7ee2f978e7042  test.gguf  -  Ok
194sha256    7dd641b32f59b60dbd4b5420c4b0f6321ccf48f58f6ae201a3dbc4a58a27c6e4  test.gguf  -  Ok
195
196Verification results for test.gguf.manifest - Success
197```
198
199
200## Crypto/Hash Libraries Used
201
202These micro c libraries dependencies was installed via the [clib c package manager](https://github.com/clibs)
203
204- https://github.com/Cyan4973/xxHash
205- https://github.com/clibs/sha1/
206- https://github.com/jb55/sha256.c