Especially if you are dealing with a lot of data, you often need a way to
identify if a value is available in a value set.
A typical use case is if you have data sharded among
several nodes, and you want to avoid asking each node for each incoming
request.
A naive approach could be to store all data in a memory list.
But here we are really talking about a lot of data, and it would simply not fit
into a memory list.
We may say that it is the purpose of a database to maintain such a
list.
So you start a good CREATE TABLE on your RDBMS with a single indexed primary
key column, fill it with your data, and run a proper SELECT.
But it takes a lot of storage, insertion is slow, and this database becomes a
bottleneck.
Then you consider using some NoSQL database like Redis.
It is faster than a RDBMS, but it tends to use a lot of memory, and it is still
resource consuming to update the values.
No comes Bloom
Filter magic.
It allows to store the presence of high-number of values with a small memory
space, with a predefined ratio of potential
false positives.
We just introduced a TSynBloomFilter
class in our Open Source
mORMot framework trunk, which features an optimized and self-tuning
Bloom Filter storage, with potential low-bandwidth synchronization over the
wire.