Quantcast
Channel: Joab Jackson, Author at The New Stack
Viewing all articles
Browse latest Browse all 697

Valkey Bloom Filter Detects Fraud (While Not Breaking the Bank)

$
0
0

Just in time for spring (if you live in the Northern Hemisphere), the Valkey open source key-value datastore now supports a new data type, Bloom filters.

The newly released valkey-bloom module can work on Valkey version 8 and later. (The most recent version of the data store, v8.1, was released last month.)

A type of probabilistic search, Bloom filters offer a very efficient way of determining whether or not a given value is a member of a data set. It won’t necessarily identify all the cases of a particular value in a mess of data, but it will let you know, in a very efficient manner, if at least one instance of that value is present.

A Bloom filter is not as accurate as a full database search — it may report false positives but not false negatives — but this approach can cut memory requirements by over 93%.

diagram

A Bloom filter in action (Wikipedia).

In other words, when the data is not there, a Bloom filter can let you know without churning through a lot of expensive disk I/O operations first.

High-Volume Membership Testing

In the right circumstances, developers can find this feature extremely useful, noted Valkey project maintainer Madelyn Olson said in a LinkedIn post.

For instance, a Bloom filter could be used to prevent showing an online ad to the same user twice. It could also be used to prevent users from visiting malicious URLs or help banks identify fraudulent credit card transactions.

“These use cases could have been implemented with bitmaps or sets, but this new datatype makes it much simpler and efficient to implement and use in your application,” Olson wrote.

The module, written in Rust and released under an open source BSD-2-Clause license, includes commands for creating, inserting and querying Bloom filters.

Event Deduplication

As an open source project, Valkey was kicked off in 2023 as a fork of the Redis key-value store after the company behind Redis switched the license for the data store away from an open source license, citing competitive pressures from cloud services.

In interviews with TNS, the founders of the open source project talked about advancing the project’s development at a speedier pace than Redis itself had become accustomed to.

In this case, however, Redis has had its own Bloom filter since 2022.

Other database systems that support the data type include PostgreSQL, Apache Cassandra and RocksDB. It can also be implemented in client-side libraries, though this approach tends not to be as performant.

The valkey-bloom plugin “offers an efficient solution for high-volume membership testing through bloom filters, providing significant memory usage savings compared to traditional data types,” wrote Valkey contributor and Amazon Web Services software engineer Karthik Subbarao, in a blog post explaining the technology in greater depth.

In one sample set, a Bloom filter can 448 million items within a relatively sparse 512MB limit, Subbarao showed.

“This enhances Valkey’s capability to handle various workloads including large-scale advertisement / event deduplication, fraud detection, and reducing disk / backend lookups more efficiently,” he wrote.

The post Valkey Bloom Filter Detects Fraud (While Not Breaking the Bank) appeared first on The New Stack.

The new data type for efficient probabilistic membership testing is for use cases like fraud detection and ad deduplication with significant memory savings.

Viewing all articles
Browse latest Browse all 697

Trending Articles