Bloom filters

Bloom filters are a data structure that can quickly answer whether an element is a member of a set. Because they are probabilistic, they tell you whether an element is maybe in a set, or definitely not in a set.

At their core, Bloom filters work by hashing elements and storing these hashes. Because hash functions map a large input space to a fixed-size output space, the set of hashes is much smaller than if you stored the original, unhashed elements. In practice, Bloom filters use several hash functions to make false positives less likely.

This data structure has many use cases, including:

  • In some key-value stores like Cassandra or ScyllaDB, trying to read a nonexistent value requires slow disk reads. Both of these use Bloom filters to make this fast.
  • Firefox uses Bloom filters to prevent malicious add-ons from running.
  • CDNs may use Bloom filters to see if a URL has been visited before, which helps decide whether to cache it or not.
  • The blogging site Medium uses Bloom filters to ensure that they don’t recommend articles that a user has already read.