aboutsummaryrefslogtreecommitdiff
path: root/examples/redis-unstable/modules/vector-sets/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'examples/redis-unstable/modules/vector-sets/README.md')
-rw-r--r--examples/redis-unstable/modules/vector-sets/README.md733
1 files changed, 0 insertions, 733 deletions
diff --git a/examples/redis-unstable/modules/vector-sets/README.md b/examples/redis-unstable/modules/vector-sets/README.md
deleted file mode 100644
index be87ff3..0000000
--- a/examples/redis-unstable/modules/vector-sets/README.md
+++ /dev/null
@@ -1,733 +0,0 @@
1**IMPORTANT:** *Please note that this is a merged module, it's part of the Redis binary now, and you don't need to build it and load it into Redis. Compiling Redis version 8 or greater will result into having the Vector Sets commands available. However, you could compile this module as a shared library in order to load it in older versions of Redis.*
2
3This module implements Vector Sets for Redis, a new Redis data type similar
4to Sorted Sets but having string elements associated to a vector instead of
5a score. The fundamental goal of Vector Sets is to make possible adding items,
6and later get a subset of the added items that are the most similar to a
7specified vector (often a learned embedding), or the most similar to the vector
8of an element that is already part of the Vector Set.
9
10Moreover, Vector sets implement optional filtered search capabilities: it is possible to associate attributes to all or to a subset of elements in the set, and then, using the `FILTER` option of the `VSIM` command, to ask for items similar to a given vector but also passing a filter specified as a simple mathematical expression (Like `".year > 1950"` or similar). This means that **you can have vector similarity and scalar filters at the same time**.
11
12## Installation
13
14**WARNING:** If you are running **Redis 8.0 RC1 or greater** you don't need to install anything, just compile Redis, and the Vector Sets commands will be part of the default install. Otherwise to test Vector Sets with older Redis versions follow the following instructions.
15
16Build with:
17
18 make
19
20Then load the module with the following command line, or by inserting the needed directives in the `redis.conf` file.
21
22 ./redis-server --loadmodule vset.so
23
24To run tests, I suggest using this:
25
26 ./redis-server --save "" --enable-debug-command yes
27
28The execute the tests with:
29
30 ./test.py
31
32## Reference of available commands
33
34**VADD: add items into a vector set**
35
36 VADD key [REDUCE dim] FP32|VALUES vector element [CAS] [NOQUANT | Q8 | BIN]
37 [EF build-exploration-factor] [SETATTR <attributes>] [M <numlinks>]
38
39Add a new element into the vector set specified by the key.
40The vector can be provided as FP32 blob of values, or as floating point
41numbers as strings, prefixed by the number of elements (3 in the example):
42
43 VADD mykey VALUES 3 0.1 1.2 0.5 my-element
44
45Meaning of the options:
46
47`REDUCE` implements random projection, in order to reduce the
48dimensionality of the vector. The projection matrix is saved and reloaded
49along with the vector set. **Please note that** the `REDUCE` option must be passed immediately before the vector, like in `REDUCE 50 VALUES ...`.
50
51`CAS` performs the operation partially using threads, in a
52check-and-set style. The neighbor candidates collection, which is slow, is
53performed in the background, while the command is executed in the main thread.
54
55`NOQUANT` forces the vector to be created (in the first VADD call to a given key) without integer 8 quantization, which is otherwise the default.
56
57`BIN` forces the vector to use binary quantization instead of int8. This is much faster and uses less memory, but has impacts on the recall quality.
58
59`Q8` forces the vector to use signed 8 bit quantization. This is the default, and the option only exists in order to make sure to check at insertion time if the vector set is of the same format.
60
61`EF` plays a role in the effort made to find good candidates when connecting the new node to the existing HNSW graph. The default is 200. Using a larger value, may help to have a better recall. To improve the recall it is also possible to increase `EF` during `VSIM` searches.
62
63`SETATTR` associates attributes to the newly created entry or update the entry attributes (if it already exists). It is the same as calling the `VSETATTR` attribute separately, so please check the documentation of that command in the filtered search section of this documentation.
64
65`M` defaults to 16 and is the HNSW famous `M` parameters. It is the maximum number of connections that each node of the graph have with other nodes: more connections mean more memory, but a better ability to explore the graph. Nodes at layer zero (every node exists at least at layer zero) have `M*2` connections, while the other layers only have `M` connections. This means that, for instance, an `M` of 64 will use at least 1024 bytes of memory for each node! That is, `64 links * 2 times * 8 bytes pointers`, and even more, since on average each node has something like 1.33 layers (but the other layers have just `M` connections, instead of `M*2`). If you don't have a recall quality problem, the default is fine, and uses a limited amount of memory.
66
67**VSIM: return elements by vector similarity**
68
69 VSIM key [ELE|FP32|VALUES] <vector or element> [WITHSCORES] [WITHATTRIBS] [COUNT num] [EPSILON delta] [EF search-exploration-factor] [FILTER expression] [FILTER-EF max-filtering-effort] [TRUTH] [NOTHREAD]
70
71The command returns similar vectors, for simplicity (and verbosity) in the following example, instead of providing a vector using FP32 or VALUES (like in `VADD`), we will ask for elements having a vector similar to a given element already in the sorted set:
72
73 > VSIM word_embeddings ELE apple
74 1) "apple"
75 2) "apples"
76 3) "pear"
77 4) "fruit"
78 5) "berry"
79 6) "pears"
80 7) "strawberry"
81 8) "peach"
82 9) "potato"
83 10) "grape"
84
85It is possible to specify a `COUNT` and also to get the similarity score (from 1 to 0, where 1 is identical, 0 is opposite vector) between the query and the returned items.
86
87 > VSIM word_embeddings ELE apple WITHSCORES COUNT 3
88 1) "apple"
89 2) "0.9998867657923256"
90 3) "apples"
91 4) "0.8598527610301971"
92 5) "pear"
93 6) "0.8226882219314575"
94
95It is also possible to specify a `EPSILON`, that is a floating point number between 0 and 1 in order to only return elements that have a distance that is no further than the specified one. In vector sets, the returned elements have a similarity score (when compared to the query vector) that is between 1 and 0, where 1 means identical, 0 opposite vectors. If for instance the `EPSILON` option is specified with an argument of 0.2, it means that we will get only elements that have a similarity of 0.8 or better (a distance < 0.2). This is useful when a large `COUNT` is specified, yet we don't want elements that are too far away our query vector.
96
97The `EF` argument is the exploration factor: the higher it is, the slower the command becomes, but the better the index is explored to find nodes that are near to our query. Sensible values are from 50 to 1000.
98
99The `TRUTH` option forces the command to perform a linear scan of all the entries inside the set, without using the graph search inside the HNSW, so it returns the best matching elements (the perfect result set) that can be used in order to easily calculate the recall. Of course the linear scan is `O(N)`, so it is much slower than the `log(N)` (considering a small `COUNT`) provided by the HNSW index.
100
101The `NOTHREAD` option forces the command to execute the search on the data structure in the main thread. Normally `VSIM` spawns a thread instead. This may be useful for benchmarking purposes, or when we work with extremely small vector sets and don't want to pay the cost of spawning a thread. It is possible that in the future this option will be automatically used by Redis when we detect small vector sets. Note that this option blocks the server for all the time needed to complete the command, so it is a source of potential latency issues: if you are in doubt, never use it.
102
103The `WITHSCORES` option returns, for each returned element, a floating point number representing how near the element is from the query, as a similarity between 0 and 1, where 0 means the vectors are opposite, and 1 means they are pointing exactly in the same direction (maximum similarity).
104
105The `WITHATTRIBS` option returns, for each element, the JSON attribute associated with the element, or NULL for the elements missing an attribute.
106
107For `FILTER` and `FILTER-EF` options, please check the filtered search section of this documentation.
108
109Note that when `WITHSCORES` and `WITHATTRIBS` are provided at the same time, the RESP2 reply guarantees that the returned elements are always in the sequence *ele*,*score*,*attribs*, while RESP3 replies will be in the form *ele > score|attrib* when just one is provided, or *ele -> [score,attrib]* when both are provided, that is, when both options are used and RESP3 is used the score and attribute will be a two-items array associated to the element key.
110
111**VDIM: return the dimension of the vectors inside the vector set**
112
113 VDIM keyname
114
115Example:
116
117 > VDIM word_embeddings
118 (integer) 300
119
120Note that in the case of vectors that were populated using the `REDUCE`
121option, for random projection, the vector set will report the size of
122the projected (reduced) dimension. Yet the user should perform all the
123queries using full-size vectors.
124
125**VCARD: return the number of elements in a vector set**
126
127 VCARD key
128
129Example:
130
131 > VCARD word_embeddings
132 (integer) 3000000
133
134
135**VREM: remove elements from vector set**
136
137 VREM key element
138
139Example:
140
141 > VADD vset VALUES 3 1 0 1 bar
142 (integer) 1
143 > VREM vset bar
144 (integer) 1
145 > VREM vset bar
146 (integer) 0
147
148VREM does not perform thumstone / logical deletion, but will actually reclaim
149the memory from the vector set, so it is save to add and remove elements
150in a vector set in the context of long running applications that continuously
151update the same index.
152
153**VEMB: return the approximated vector of an element**
154
155 VEMB key element
156
157Example:
158
159 > VEMB word_embeddings SQL
160 1) "0.18208661675453186"
161 2) "0.08535309880971909"
162 3) "0.1365649551153183"
163 4) "-0.16501599550247192"
164 5) "0.14225517213344574"
165 ... 295 more elements ...
166
167Because vector sets perform insertion time normalization and optional
168quantization, the returned vector could be approximated. `VEMB` will take
169care to de-quantized and de-normalize the vector before returning it.
170
171It is possible to ask VEMB to return raw data, that is, the internal representation used by the vector: fp32, int8, or a bitmap for binary quantization. This behavior is triggered by the `RAW` option of of VEMB:
172
173 VEMB word_embedding apple RAW
174
175In this case the return value of the command is an array of three or more elements:
1761. The name of the quantization used, that is one of: "fp32", "bin", "q8".
1772. The a string blob containing the raw data, 4 bytes fp32 floats for fp32, a bitmap for binary quants, or int8 bytes array for q8 quants.
1783. A float representing the l2 of the vector before normalization. You need to multiply by this vector if you want to de-normalize the value for any reason.
179
180For q8 quantization, an additional elements is also returned: the quantization
181range, so the integers from -127 to 127 represent (normalized) components
182in the range `-range`, `+range`.
183
184**VISMEMBER: test if a given element already exists**
185
186This command will return 1 (or true) if the specified element is already in the vector set, otherwise 0 (or false) is returned.
187
188 VISMEMBER key element
189
190As with other existence check Redis commands, if the key does not exist it is considered as if it was empty, thus the element is reported as non existing.
191
192**VRANGE: return elements in a lexicographical range
193
194 VRANGE key start end count
195
196The `VRANGE` command has many different use cases, but its main goal is to
197provide a stateless iterator for the elements inside a vector set: that is,
198it allows to retrieve all the elements inside a vector set in small amounts
199for each call, without an explicit cursor, and with guarantees about what
200the user will miss in case the vector set is changing (elements added and/or
201removed) during the iteration.
202
203The command usage is straightforward:
204
205```
206> VRANGE word_embeddings_int8 [Redis + 10
207 1) "Redis"
208 2) "Rediscover"
209 3) "Rediscover_Ashland"
210 4) "Rediscover_Northern_Ireland"
211 5) "Rediscovered"
212 6) "Rediscovered_Bookshop"
213 7) "Rediscovering"
214 8) "Rediscovering_God"
215 9) "Rediscovering_Lost"
21610) "Rediscovers"
217```
218
219The above command returns 10 (or less, if less are available in the specified range) elements from "Redis" (inclusive) to the maximum possible element. The comparison is performed byte by byte, as `memcmp()` would do, in this way the elements have a total order. The start and end range can be either a string, prefixed by `[` or `(` (the prefix is mandatory) to tell the command if the range is inclusive or exclusive, or can be the special symbols `-` and `+` that means the maximum and minimum element.
220
221So for instance if I want to iterate all the elements, ten elements for each call, I'll proceed as such:
222
223```
224> VRANGE mykey - + 10
225 1) "a"
226 2) "a-league"
227 3) "a."
228 4) "a.d."
229 5) "a.k.a."
230 6) "a.m."
231 7) "a1"
232 8) "a2"
233 9) "a3"
23410) "a7"
235```
236
237This will give me the first 10 elements. Then I want the next ten elements
238starting from the last element in the previous result, but *excluding* it,
239so the next range will use the `(` prefix with the last element of the
240previous call, that was `"a7"`:
241
242```
243> VRANGE mykey (a7 + 10
244 1) "a930913"
245 2) "aa"
246 3) "aaa"
247 4) "aaron"
248 5) "ab"
249 6) "aba"
250 7) "abandon"
251 8) "abandoned"
252 9) "abandoning"
25310) "abandonment"
254```
255
256And so forth.
257
258The command count is mandatory, however a negative count means to return all the elements in the set. This means that `VRANGE mykey - + -1` will return every element. Of course, iterating like that means that it is possible to block the server for a long time.
259
260The command time complexity is O(1) to seek to the element (considering the element would be of reasonable size), since we use a Radix Tree in the underlying implementation, plus the time to yield "M" elements. So if M is small, each call is just executed in constant time. However the iteration of a total set (via multiple calls) of N elements is O(N). Basically: this command, with a small count, will never produce latency issues in the Redis server.
261
262In case the elements are changing continuously as the set is iterated, the guarantees are very simple: each range will produce exactly the elements that were present in the range in the moment the `VRANGE` command was executed. In other words, an iteration performed in this way is *guaranteed* to return all the elements that stayed within the vector set from the start to the end of the iteration. Elements removed or added in the meantime may be returned or not depending on the moment they were added or removed.
263
264**VLINKS: introspection command that shows neighbors for a node**
265
266 VLINKS key element [WITHSCORES]
267
268The command reports the neighbors for each level.
269
270**VINFO: introspection command that shows info about a vector set**
271
272 VINFO key
273
274Example:
275
276 > VINFO word_embeddings
277 1) quant-type
278 2) int8
279 3) vector-dim
280 4) (integer) 300
281 5) size
282 6) (integer) 3000000
283 7) max-level
284 8) (integer) 12
285 9) vset-uid
286 10) (integer) 1
287 11) hnsw-max-node-uid
288 12) (integer) 3000000
289
290**VSETATTR: associate or remove the JSON attributes of elements**
291
292 VSETATTR key element "{... json ...}"
293
294Each element of a vector set can be optionally associated with a JSON string
295in order to use the `FILTER` option of `VSIM` to filter elements by scalars
296(see the filtered search section for more information). This command can set,
297update (if already set) or delete (if you set to an empty string) the
298associated JSON attributes of an element.
299
300The command returns 0 if the element or the key don't exist, without
301raising an error, otherwise 1 is returned, and the element attributes
302are set or updated.
303
304**VGETATTR: retrieve the JSON attributes of elements**
305
306 VGETATTR key element
307
308The command returns the JSON attribute associated with an element, or
309null if there is no element associated, or no element at all, or no key.
310
311**VRANDMEMBER: return random members from a vector set**
312
313 VRANDMEMBER key [count]
314
315Return one or more random elements from a vector set.
316
317The semantics of this command are similar to Redis's native SRANDMEMBER command:
318
319- When called without count, returns a single random element from the set, as a single string (no array reply).
320- When called with a positive count, returns up to count distinct random elements (no duplicates).
321- When called with a negative count, returns count random elements, potentially with duplicates.
322- If the count value is larger than the set size (and positive), only the entire set is returned.
323
324If the key doesn't exist, returns a Null reply if count is not given, or an empty array if a count is provided.
325
326Examples:
327
328 > VADD vset VALUES 3 1 0 0 elem1
329 (integer) 1
330 > VADD vset VALUES 3 0 1 0 elem2
331 (integer) 1
332 > VADD vset VALUES 3 0 0 1 elem3
333 (integer) 1
334
335 # Return a single random element
336 > VRANDMEMBER vset
337 "elem2"
338
339 # Return 2 distinct random elements
340 > VRANDMEMBER vset 2
341 1) "elem1"
342 2) "elem3"
343
344 # Return 3 random elements with possible duplicates
345 > VRANDMEMBER vset -3
346 1) "elem2"
347 2) "elem2"
348 3) "elem1"
349
350 # Return more elements than in the set (returns all elements)
351 > VRANDMEMBER vset 10
352 1) "elem1"
353 2) "elem2"
354 3) "elem3"
355
356 # When key doesn't exist
357 > VRANDMEMBER nonexistent
358 (nil)
359 > VRANDMEMBER nonexistent 3
360 (empty array)
361
362This command is particularly useful for:
363
3641. Selecting random samples from a vector set for testing or training.
3652. Performance testing by retrieving random elements for subsequent similarity searches.
366
367When the user asks for unique elements (positev count) the implementation optimizes for two scenarios:
368- For small sample sizes (less than 20% of the set size), it uses a dictionary to avoid duplicates, and performs a real random walk inside the graph.
369- For large sample sizes (more than 20% of the set size), it starts from a random node and sequentially traverses the internal list, providing faster performances but not really "random" elements.
370
371The command has `O(N)` worst-case time complexity when requesting many unique elements (it uses linear scanning), or `O(M*log(N))` complexity when the users asks for `M` random elements in a sorted set of `N` elements, with `M` much smaller than `N`.
372
373# Filtered search
374
375Each element of the vector set can be associated with a set of attributes specified as a JSON blob:
376
377 > VADD vset VALUES 3 1 1 1 a SETATTR '{"year": 1950}'
378 (integer) 1
379 > VADD vset VALUES 3 -1 -1 -1 b SETATTR '{"year": 1951}'
380 (integer) 1
381
382Specifying an attribute with the `SETATTR` option of `VADD` is exactly equivalent to adding an element and then setting (or updating, if already set) the attributes JSON string. Also the symmetrical `VGETATTR` command returns the attribute associated to a given element.
383
384 > VADD vset VALUES 3 0 1 0 c
385 (integer) 1
386 > VSETATTR vset c '{"year": 1952}'
387 (integer) 1
388 > VGETATTR vset c
389 "{\"year\": 1952}"
390
391At this point, I may use the FILTER option of VSIM to only ask for the subset of elements that are verified by my expression:
392
393 > VSIM vset VALUES 3 0 0 0 FILTER '.year > 1950'
394 1) "c"
395 2) "b"
396
397The items will be returned again in order of similarity (most similar first), but only the items with the year field matching the expression is returned.
398
399The expressions are similar to what you would write inside the `if` statement of JavaScript or other familiar programming languages: you can use `and`, `or`, the obvious math operators like `+`, `-`, `/`, `>=`, `<`, ... and so forth (see the expressions section for more info). The selectors of the JSON object attributes start with a dot followed by the name of the key inside the JSON objects.
400
401Elements with invalid JSON or not having a given specified field **are considered as not matching** the expression, but will not generate any error at runtime.
402
403## FILTER expressions capabilities
404
405FILTER expressions allow you to perform complex filtering on vector similarity results using a JavaScript-like syntax. The expression is evaluated against each element's JSON attributes, with only elements that satisfy the expression being included in the results.
406
407### Expression Syntax
408
409Expressions support the following operators and capabilities:
410
4111. **Arithmetic operators**: `+`, `-`, `*`, `/`, `%` (modulo), `**` (exponentiation)
4122. **Comparison operators**: `>`, `>=`, `<`, `<=`, `==`, `!=`
4133. **Logical operators**: `and`/`&&`, `or`/`||`, `!`/`not`
4144. **Containment operator**: `in`
4155. **Parentheses** for grouping: `(...)`
416
417### Selector Notation
418
419Attributes are accessed using dot notation:
420
421- `.year` references the "year" attribute
422- `.movie.year` would **NOT** reference the "year" field inside a "movie" object, only keys that are at the first level of the JSON object are accessible.
423
424### JSON and expressions data types
425
426Expressions can work with:
427
428- Numbers (dobule precision floats)
429- Strings (enclosed in single or double quotes)
430- Booleans (no native type: they are represented as 1 for true, 0 for false)
431- Arrays (for use with the `in` operator: `value in [1, 2, 3]`)
432
433JSON attributes are converted in this way:
434
435- Numbers will be converted to numbers.
436- Strings to strings.
437- Booleans to 0 or 1 number.
438- Arrays to tuples (for "in" operator), but only if composed of just numbers and strings.
439
440Any other type is ignored, and accessig it will make the expression evaluate to false.
441
442### The IN operator
443
444The `IN` operator works in two ways, it can test for membership in an array, like in:
445
446 5 in [1, 2, 3]
447 "foo" in [1, "foo", "bar"]
448
449But can also check for substrings, in case the A and B operators are both strings.
450
451 "foo" in "barfoobar" # Will evaluate to true
452 "zap" in "foobar" # Will evaluate to false
453
454### Examples
455
456```
457# Find items from the 1980s
458VSIM movies VALUES 3 0.5 0.8 0.2 FILTER '.year >= 1980 and .year < 1990'
459
460# Find action movies with high ratings
461VSIM movies VALUES 3 0.5 0.8 0.2 FILTER '.genre == "action" and .rating > 8.0'
462
463# Find movies directed by either Spielberg or Nolan
464VSIM movies VALUES 3 0.5 0.8 0.2 FILTER '.director in ["Spielberg", "Nolan"]'
465
466# Complex condition with numerical operations
467VSIM movies VALUES 3 0.5 0.8 0.2 FILTER '(.year - 2000) ** 2 < 100 and .rating / 2 > 4'
468```
469
470### Error Handling
471
472Elements with any of the following conditions are considered not matching:
473- Missing the queried JSON attribute
474- Having invalid JSON in their attributes
475- Having a JSON value that cannot be converted to the expected type
476
477This behavior allows you to safely filter on optional attributes without generating errors.
478
479### FILTER effort
480
481The `FILTER-EF` option controls the maximum effort spent when filtering vector search results.
482
483When performing vector similarity search with filtering, Vector Sets perform the standard similarity search as they apply the filter expression to each node. Since many results might be filtered out, Vector Sets may need to examine a lot more candidates than the requested `COUNT` to ensure sufficient matching results are returned. Actually, if the elements matching the filter are very rare or if there are less than elements matching than the specified count, this would trigger a full scan of the HNSW graph.
484
485For this reason, by default, the maximum effort is limited to a reasonable amount of nodes explored.
486
487### Modifying the FILTER effort
488
4891. By default, Vector Sets will explore up to `COUNT * 100` candidates to find matching results.
4902. You can control this exploration with the `FILTER-EF` parameter.
4913. A higher `FILTER-EF` value increases the chances of finding all relevant matches at the cost of increased processing time.
4924. A `FILTER-EF` of zero will explore as many nodes as needed in order to actually return the number of elements specified by `COUNT`.
4935. Even when a high `FILTER-EF` value is specified **the implementation will do a lot less work** if the elements passing the filter are very common, because of the early stop conditions of the HNSW implementation (once the specified amount of elements is reached and the quality check of the other candidates trigger an early stop).
494
495```
496VSIM key [ELE|FP32|VALUES] <vector or element> COUNT 10 FILTER '.year > 2000' FILTER-EF 500
497```
498
499In this example, Vector Sets will examine up to 500 potential nodes. Of course if count is reached before exploring 500 nodes, and the quality checks show that it is not possible to make progresses on similarity, the search is ended sooner.
500
501### Performance Considerations
502
503- If you have highly selective filters (few items match), use a higher `FILTER-EF`, or just design your application in order to handle a result set that is smaller than the requested count. Note that anyway the additional elements may be too distant than the query vector.
504- For less selective filters, the default should be sufficient.
505- Very selective filters with low `FILTER-EF` values may return fewer items than requested.
506- Extremely high values may impact performance without significantly improving results.
507
508The optimal `FILTER-EF` value depends on:
5091. The selectivity of your filter.
5102. The distribution of your data.
5113. The required recall quality.
512
513A good practice is to start with the default and increase if needed when you observe fewer results than expected.
514
515### Testing a larg-ish data set
516
517To really see how things work at scale, you can [download](https://antirez.com/word2vec_with_attribs.rdb) the following dataset:
518
519 wget https://antirez.com/word2vec_with_attribs.rdb
520
521It contains the 3 million words in Word2Vec having as attribute a JSON with just the length of the word. Because of the length distribution of words in large amounts of texts, where longer words become less and less common, this is ideal to check how filtering behaves with a filter verifying as true with less and less elements in a vector set.
522
523For instance:
524
525 > VSIM word_embeddings_bin ele "pasta" FILTER ".len == 6"
526 1) "pastas"
527 2) "rotini"
528 3) "gnocci"
529 4) "panino"
530 5) "salads"
531 6) "breads"
532 7) "salame"
533 8) "sauces"
534 9) "cheese"
535 10) "fritti"
536
537This will easily retrieve the desired amount of items (`COUNT` is 10 by default) since there are many items of length 6. However:
538
539 > VSIM word_embeddings_bin ele "pasta" FILTER ".len == 33"
540 1) "skinless_boneless_chicken_breasts"
541 2) "boneless_skinless_chicken_breasts"
542 3) "Boneless_skinless_chicken_breasts"
543
544This time even if we asked for 10 items, we only get 3, since the default filter effort will be `10*100 = 1000`. We can tune this giving the effort in an explicit way, with the risk of our query being slower, of course:
545
546 > VSIM word_embeddings_bin ele "pasta" FILTER ".len == 33" FILTER-EF 10000
547 1) "skinless_boneless_chicken_breasts"
548 2) "boneless_skinless_chicken_breasts"
549 3) "Boneless_skinless_chicken_breasts"
550 4) "mozzarella_feta_provolone_cheddar"
551 5) "Greatfood.com_R_www.greatfood.com"
552 6) "Pepperidge_Farm_Goldfish_crackers"
553 7) "Prosecuted_Mobsters_Rebuilt_Dying"
554 8) "Crispy_Snacker_Sandwiches_Popcorn"
555 9) "risultati_delle_partite_disputate"
556 10) "Peppermint_Mocha_Twist_Gingersnap"
557
558This time we get all the ten items, even if the last one will be quite far from our query vector. We encourage to experiment with this test dataset in order to understand better the dynamics of the implementation and the natural tradeoffs of filtered search.
559
560**Keep in mind** that by default, Redis Vector Sets will try to avoid a likely very useless huge scan of the HNSW graph, and will be more happy to return few or no elements at all, since this is almost always what the user actually wants in the context of retrieving *similar* items to the query.
561
562# Single Instance Scalability and Latency
563
564Vector Sets implement a threading model that allows Redis to handle many concurrent requests: by default `VSIM` is always threaded, and `VADD` is not (but can be partially threaded using the `CAS` option). This section explains how the threading and locking mechanisms work, and what to expect in terms of performance.
565
566## Threading Model
567
568- The `VSIM` command runs in a separate thread by default, allowing Redis to continue serving other commands.
569- A maximum of 32 threads can run concurrently (defined by `HNSW_MAX_THREADS`).
570- When this limit is reached, additional `VSIM` requests are queued - Redis remains responsive, no latency event is generated.
571- The `VADD` command with the `CAS` option also leverages threading for the computation-heavy candidate search phase, but the insertion itself is performed in the main thread. `VADD` always runs in a sub-millisecond time, so this is not a source of latency, but having too many hundreds of writes per second can be challenging to handle with a single instance. Please, look at the next section about multiple instances scalability.
572- Commands run within Lua scripts, MULTI/EXEC blocks, or from replication are executed in the main thread to ensure consistency.
573
574```
575> VSIM vset VALUES 3 1 1 1 FILTER '.year > 2000' # This runs in a thread.
576> VADD vset VALUES 3 1 1 1 element CAS # Candidate search runs in a thread.
577```
578
579## Locking Mechanism
580
581Vector Sets use a read/write locking mechanism to coordinate access:
582
583- Reads (`VSIM`, `VEMB`, etc.) acquire a read lock, allowing multiple concurrent reads.
584- Writes (`VADD`, `VREM`, etc.) acquire a write lock, temporarily blocking all reads.
585- When a write lock is requested while reads are in progress, the write operation waits for all reads to complete.
586- Once a write lock is granted, all reads are blocked until the write completes.
587- Each thread has a dedicated slot for tracking visited nodes during graph traversal, avoiding contention. This improves performances but limits the maximum number of concurrent threads, since each node has a memory cost proportional to the number of slots.
588
589## DEL latency
590
591Deleting a very large vector set (millions of elements) can cause latency spikes, as deletion rebuilds connections between nodes. This may change in the future.
592The deletion latency is most noticeable when using `DEL` on a key containing a large vector set or when the key expires.
593
594## Performance Characteristics
595
596- Search operations (`VSIM`) scale almost linearly with the number of CPU cores available, up to the thread limit. You can expect a Vector Set composed of million of items associated with components of dimension 300, with the default int8 quantization, to deliver around 50k VSIM operations per second in a single host.
597- Insertion operations (`VADD`) are more computationally expensive than searches, and can't be threaded: expect much lower throughput, in the range of a few thousands inserts per second.
598- Binary quantization offers significantly faster search performance at the cost of some recall quality, while int8 quantization, the default, seems to have very small impacts on recall quality, while it significantly improves performances and space efficiency.
599- The `EF` parameter has a major impact on both search quality and performance - higher values mean better recall but slower searches.
600- Graph traversal time scales logarithmically with the number of elements, making Vector Sets efficient even with millions of vectors
601
602## Loading / Saving performances
603
604Vector Sets are able to serialize on disk the graph structure as it is in memory, so loading back the data does not need to rebuild the HNSW graph. This means that Redis can load millions of items per minute. For instance 3 million items with 300 components vectors can be loaded back into memory into around 15 seconds.
605
606# Scaling vector sets to multiple instances
607
608The fundamental way vector sets can be scaled to very large data sets
609and to many Redis instances is that a given very large set of vectors
610can be partitioned into N different Redis keys, that can also live into
611different Redis instances.
612
613For instance, I could add my elements into `key0`, `key1`, `key2`, by hashing
614the item in some way, like doing `crc32(item)%3`, effectively splitting
615the dataset into three different parts. However once I want all the vectors
616of my dataset near to a given query vector, I could simply perform the
617`VSIM` command against all the three keys, merging the results by
618score (so the commands must be called using the `WITHSCORES` option) on
619the client side: once the union of the results are ordered by the
620similarity score, the query is equivalent to having a single key `key1+2+3`
621containing all the items.
622
623There are a few interesting facts to note about this pattern:
624
6251. It is possible to have a logical sorted set that is as big as the sum of all the Redis instances we are using.
6262. Deletion operations remain simple, we can hash the key and select the key where our item belongs.
6273. However, even if I use 10 different Redis instances, I'm not going to reach 10x the **read** operations per second, compared to using a single server: for each logical query, I need to query all the instances. Yet, smaller graphs are faster to navigate, so there is some win even from the point of view of CPU usage.
6284. Insertions, so **write** queries, will be scaled linearly: I can add N items against N instances at the same time, splitting the insertion load evenly. This is very important since vector sets, being based on HNSW data structures, are slower to add items than to query similar items, by a very big factor.
6295. While it cannot guarantee always the best results, with proper timeout management this system may be considered *highly available*, since if a subset of N instances are reachable, I'll be still be able to return similar items to my query vector.
630
631Notably, this pattern can be implemented in a way that avoids paying the sum of the round trip time with all the servers: it is possible to send the queries at the same time to all the instances, so that latency will be equal the slower reply out of of the N servers queries.
632
633# Optimizing memory usage
634
635Vector Sets, or better, HNSWs, the underlying data structure used by Vector Sets, combined with the features provided by the Vector Sets themselves (quantization, random projection, filtering, ...) form an implementation that has a non-trivial space of parameters that can be tuned. Despite to the complexity of the implementation and of vector similarity problems, here there is a list of simple ideas that can drive the user to pick the best settings:
636
637* 8 bit quantization (the default) is almost always a win. It reduces the memory usage of vectors by a factor of 4, yet the performance penalty in terms of recall is minimal. It also reduces insertion and search time by around 2 times or more.
638* Binary quantization is much more extreme: it makes vector sets a lot faster, but increases the recall error in a sensible way, for instance from 95% to 80% if all the parameters remain the same. Yet, the speedup is really big, and the memory usage of vectors, compaerd to full precision vectors, 32 times smaller.
639* Vectors memory usage are not the only responsible for Vector Set high memory usage per entry: nodes contain, on average `M*2 + M*0.33` pointers, where M is by default 16 (but can be tuned in `VADD`, see the `M` option). Also each node has the string item and the optional JSON attributes: those should be as small as possible in order to avoid contributing more to the memory usage.
640* The `M` parameter should be increased to 32 or more only when a near perfect recall is really needed.
641* It is possible to gain space (less memory usage) sacrificing time (more CPU time) by using a low `M` (the default of 16, for instance) and a high `EF` (the effort parameter of `VSIM`) in order to scan the graph more deeply.
642* When memory usage is seriosu concern, and there is the suspect the vectors we are storing don't contain as much information - at least for our use case - to justify the number of components they feature, random projection (the `REDUCE` option of `VADD`) could be tested to see if dimensionality reduction is possible with acceptable precision loss.
643
644## Random projection tradeoffs
645
646Sometimes learned vectors are not as information dense as we could guess, that
647is there are components having similar meanings in the space, and components
648having values that don't really represent features that matter in our use case.
649
650At the same time, certain vectors are very big, 1024 components or more. In this cases, it is possible to use the random projection feature of Redis Vector Sets in order to reduce both space (less RAM used) and space (more operstions per second). The feature is accessible via the `REDUCE` option of the `VADD` command. However, keep in mind that you need to test how much reduction impacts the performances of your vectors in term of recall and quality of the results you get back.
651
652## What is a random projection?
653
654The concept of Random Projection is relatively simple to grasp. For instance, a projection that turns a 100 components vector into a 10 components vector will perform a different linear transformation between the 100 components and each of the target 10 components. Please note that *each of the target components* will get some random amount of all the 100 original components. It is mathematically proved that this process results in a vector space where elements still have similar distances among them, but still some information will get lost.
655
656## Examples of projections and loss of precision
657
658To show you a bit of a extreme case, let's take Word2Vec 3 million items and compress them from 300 to 100, 50 and 25 components vectors. Then, we check the recall compared to the ground truth against each of the vector sets produced in this way (using different `REDUCE` parameters of `VADD`). This is the result, obtained asking for the top 10 elements.
659
660```
661----------------------------------------------------------------------
662Key Average Recall % Std Dev
663----------------------------------------------------------------------
664word_embeddings_int8 95.98 12.14
665 ^ This is the same key used for ground truth, but without TRUTH option
666word_embeddings_reduced_100 40.20 20.13
667word_embeddings_reduced_50 24.42 16.89
668word_embeddings_reduced_25 14.31 9.99
669```
670
671Here the dimensionality reduction we are using is quite extreme: from 300 to 100 means that 66.6% of the original information is lost. The recall drops from 96% to 40%, down to 24% and 14% for even more extreme dimension reduction.
672
673Reducing the dimension of vectors that are already relatively small, like the above example, of 300 components, will provide only relatively small memory savings, especially because by default Vector Sets use `int8` quantization, that will use only one byte per component:
674
675```
676> MEMORY USAGE word_embeddings_int8
677(integer) 3107002888
678> MEMORY USAGE word_embeddings_reduced_100
679(integer) 2507122888
680```
681
682Of course going, for example, from 2048 component vectors to 1024 would provide a much more sensible memory saving, even with the `int8` quantization used by Vector Sets, assuming the recall loss is acceptable. Other than the memory saving, there is also the reduction in CPU time, translating to more operations per second.
683
684Another thing to note is that, with certain embedding models, binary quantization (that offers a 8x reduction of memory usage compared to 8 bit quants, and a very big speedup in computation) performs much better than reducing the dimension of vectors of the same amount via random projections:
685
686```
687word_embeddings_bin 35.48 19.78
688```
689
690Here in the same test did above: we have a 35% recall which is not too far than the 40% obtained with a random projection from 300 to 100 components. However, while the first technique reduces the size by 3 times, the size reduced of binary quantization is by 8 times.
691
692```
693> memory usage word_embeddings_bin
694(integer) 2327002888
695```
696
697In this specific case the key uses JSON attributes and has a graph connection overhead that is much bigger than the 300 bits each vector takes, but, as already said, for big vectors (1024 components, for instance) or for lower values of `M` (see `VADD`, the `M` parameter connects the level of connectivity, so it changes the amount of pointers used per node) the memory saving is much stronger.
698
699# Vector Sets troubleshooting and understandability
700
701## Debugging poor recall or unexpected results
702
703Vector graphs and similarity queries pose many challenges mainly due to the following three problems:
704
7051. The error due to the approximated nature of Vector Sets is hard to evaluate.
7062. The error added by the quantization is often depends on the exact vector space (the embedding we are using **and** how far apart the elements we represent into such embeddings are).
7073. We live in the illusion that learned embeddings capture the best similarity possible among elements, which is obviously not always true, and highly application dependent.
708
709The only way to debug such problems, is the ability to inspect step by step what is happening inside our application, and the structure of the HNSW graph itself. To do so, we suggest to consider the following tools:
710
7111. The `TRUTH` option of the `VSIM` command is able to return the ground truth of the most similar elements, without using the HNSW graph, but doing a linear scan.
7122. The `VLINKS` command allows to explore the graph to see if the connections among nodes make sense, and to investigate why a given node may be more isolated than expected. Such command can also be used in a different way, when we want very fast "similar items" without paying the HNSW traversal time. It exploits the fact that we have a direct reference from each element in our vector set to each node in our HNSW graph.
7133. The `WITHSCORES` option, in the supported commands, return a value that is directly related to the *cosine similarity* between the query and the items vectors, the interval of the similarity is simply rescaled from the -1, 1 original range to 0, 1, otherwise the metric is identical.
714
715## Clients, latency and bandwidth usage
716
717During Vector Sets testing, we discovered that often clients introduce considerable latecy and CPU usage (in the client side, not in Redis) for two main reasons:
718
7191. Often the serialization to `VALUES ... list of floats ...` can be very slow.
7202. The vector payload of floats represented as strings is very large, resulting in high bandwidth usage and latency, compared to other Redis commands.
721
722Switching from `VALUES` to `FP32` as a method for transmitting vectors may easily provide 10-20x speedups.
723
724# Implementation details
725
726Vector sets are based on the `hnsw.c` implementation of the HNSW data structure with extensions for speed and functionality.
727
728The main features are:
729
730* Proper nodes deletion with relinking.
731* 8 bits and binary quantization.
732* Threaded queries.
733* Filtered search with predicate callback.