Archives - July 2016 - Synopse Open Source

Especially if you are dealing with a lot of data, you often need a way to identify if a value is available in a value set.
A typical use case is if you have data sharded among several nodes, and you want to avoid asking each node for each incoming request.

A naive approach could be to store all data in a memory list.
But here we are really talking about a lot of data, and it would simply not fit into a memory list.

We may say that it is the purpose of a database to maintain such a list.
So you start a good CREATE TABLE on your RDBMS with a single indexed primary key column, fill it with your data, and run a proper SELECT.
But it takes a lot of storage, insertion is slow, and this database becomes a bottleneck.

Then you consider using some NoSQL database like Redis.
It is faster than a RDBMS, but it tends to use a lot of memory, and it is still resource consuming to update the values.

No comes Bloom Filter magic.
It allows to store the presence of high-number of values with a small memory space, with a predefined ratio of potential false positives.

We just introduced a TSynBloomFilter class in our Open Source mORMot framework trunk, which features an optimized and self-tuning Bloom Filter storage, with potential low-bandwidth synchronization over the wire.

Synopse Open Source

July 2016 (1)

Bloom Filter and Big Data