Bloom Filter

DSA

Uploaded by

perewa7600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views50 pages

Bloom Filter

DSA

Uploaded by

perewa7600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

BLOOM FILTER

Dr. CHANDRALEKHA M
ASSISTANT PROFESSOR
DEPT. OF COMPUTER SCIENCE AND ENGINEERING
AMRITA SCHOOL OF COMPUTING, CHENNAI CAMPUS
Mob. No: +91 9442414745
1. Suppose you are creating an account on Gmail, you want to enter a
cool username, you entered it and got a message, “Username is
already taken”.
2. You added your birth date along username, still no luck.
3. Now you have added your university roll number also, still got
“Username is already taken”.
4. It’s really frustrating, isn’t it?
5. But have you ever thought about how quickly Gmail checks
availability of username by searching millions of username
registered with it.
There are many ways to do this job –
Linear search : Bad idea!
Binary Search : Store all username alphabetically and compare entered
username with middle one in list, If it matched, then username is taken
otherwise figure out, whether entered username will come before or
after middle one and if it will come after, neglect all the usernames
before middle one(inclusive). Now search after middle one and repeat
this process until you got a match or search end with no match. This
technique is better and promising but still it requires multiple steps.
But, there must be something better!!
Bloom Filter is a data structure that can do this job.
LINEAR SEARCH
Let the elements of array are –
• Let the element to be searched is K = 41
• Now, start from the first element and compare K with each element of
the array.
• The value of K, i.e., 41, is not matched with the first element of the
array.
• So, move to the next element. And follow the same process until the
respective element is found.
Now, the element to be searched is found. So algorithm will return the index of the element matched.
BINARY SEARCH
Let the elements of array are –
Let the element to search is, K = 56
We have to use the below formula to calculate the mid of the array -
mid = (beg + end)/2
So, in the given array -
beg = 0
end = 8
mid = (0 + 8)/2 = 4. So, 4 is the mid of the array.
Now, the element to search is found. So algorithm will return the index of the element
matched.
Binary search is implemented using following steps...
Step 1 - Read the search element from the user.
Step 2 - Find the middle element in the sorted list.
Step 3 - Compare the search element with the middle element in the sorted list.
Step 4 - If both are matched, then display "Given element is found!!!" and
terminate the function.
Step 5 - If both are not matched, then check whether the search element is smaller
or larger than the middle element.
Step 6 - If the search element is smaller than middle element, repeat steps 2, 3, 4
and 5 for the left sublist of the middle element.
Step 7 - If the search element is larger than middle element, repeat steps 2, 3, 4 and
5 for the right sublist of the middle element.
Step 8 - Repeat the same process until we find the search element in the list or
until sublist contains only one element.
Step 9 - If that element also doesn't match with the search element, then
display "Element is not found in the list!!!" and terminate the function.
What is Bloom Filter?
• Bloom filter is a space-efficient probabilistic data structure
(data structures that provide approximate answers to queries
about a large dataset, rather than exact answers) that tells
whether an element may be in a set or definitely is not.
• If we look up an item in the Bloom filter, we can get two
possible results.
✓The item is not present in the set: True negative.
✓The item might be present in the set: Can be either a False
positive or True positive.
• For example, checking availability of username is set membership
problem, where the set is the list of all registered username.
• The price we pay for efficiency is that it is probabilistic in nature that
means, there might be some False Positive results.
• False positive means, it might tell that given username is already
taken but actually it’s not.
Properties of Bloom Filters
• Unlike a standard hash table, a Bloom filter of a fixed size can
represent a set with an arbitrarily large number of elements.
• Adding an element never fails. However, the false positive rate
increases steadily as elements are added until all bits in the filter are
set to 1, at which point all queries yield a positive result.
• Bloom filters never generate false negative result, i.e., telling you that
a username doesn’t exist when it actually exists.
• Deleting elements from filter is not possible because, if we delete a
single element by clearing bits at indices generated by k hash
functions, it might cause deletion of few other elements.
• Example – if we delete “geeks” (in given example below) by clearing
bit at 1, 4 and 7, we might end up deleting “nerd” also Because bit at
index 4 becomes 0 and bloom filter claims that “nerd” is not present.
Working of Bloom Filter
An empty bloom filter is a bit array of m bits, all set to zero, like this –
• We need k number of hash functions to calculate the hashes for a
given input.
• When we want to add an item in the filter, the bits at k indices h1(x),
h2(x),… hk(x) are set, where indices are calculated using hash
functions.
Example – Suppose we want to enter “geeks” in the filter, we are using
3 hash functions and a bit array of length 10, all set to 0 initially. Firstly
we’ll calculate the hashes as follows:
h1(“geeks”) % 10 = 1
h2(“geeks”) % 10 = 4
h3(“geeks”) % 10 = 7
Note: These outputs are random for explanation only.
Now we will set the bits at indices 1, 4 and 7 to 1
Again, we want to enter “nerd”, similarly, we’ll calculate hashes
h1(“nerd”) % 10 = 3
h2(“nerd”) % 10 = 5
h3(“nerd”) % 10 = 4
Set the bits at indices 3, 5 and 4 to 1
• Now if we want to check “geeks” is present in filter or not. We’ll do
the same process but this time in reverse order.
• We calculate respective hashes using h1, h2 and h3 and check if all
these indices are set to 1 in the bit array.
• If all the bits are set then we can say that “geeks” is probably present.
• If any of the bit at these indices are 0 then “geeks” is definitely not
present.
False Positive in Bloom Filters
The question is why we said “probably present”, why this uncertainty.
Let’s understand this with an example.
Suppose we want to check whether “cat” is present or not.
We’ll calculate hashes using h1, h2 and h3
h1(“cat”) % 10 = 1
h2(“cat”) % 10 = 3
h3(“cat”) % 10 = 7
• If we check the bit array, bits at these indices are set to 1 but we know
that “cat” was never added to the filter.
• Bit at index 1 and 7 was set when we added “geeks” and bit 3 was set
we added “nerd”.
• So, because bits at calculated indices are already set by some other
item, bloom filter erroneously claims that “cat” is present and
generating a false positive result.
• Depending on the application, it could be huge downside or relatively
okay.
• We can control the probability of getting a false positive by
controlling the size of the Bloom filter.
• More space means fewer false positives.
• If we want to decrease probability of false positive result, we have to
use more number of hash functions and larger bit array.
• This would add latency in addition to the item and checking
membership.
Operations that a Bloom Filter supports
insert(x) : To insert an element in the Bloom Filter.
lookup(x) : to check whether an element is already present in Bloom
Filter with a positive false probability.
NOTE : We cannot delete an element in Bloom Filter.
Example of Bloom Filter
Suppose that the size of our bloom filter is m = 10.
Inserting an item to the bloom filter
• For example, we want to add the word “coding”.
• After passing it through three hash functions, we get the following
results.
h1(“coding”) = 125
h2(“coding”) = 67
h3(“coding”) = 19
• We need to take mod of 10 for each of these values so that the index is
within the bounds of the bloom filter.
• Therefore, indexes at 125%10 = 5, 67%10 = 7 and 19%10 = 9 have to
be set to 1.
Testing membership of an item in Bloom filter
• If we want to test the membership of an element, we need to pass it
through same hash functions.
• If bits are already set for all these indexes, then this element might
exist in the set.
• However, even if one index is not set, we are sure that this element is
not present in the set.
• Let’s say we want to check the membership of “cat” in our set.
• Furthermore, we have already added two elements, “coding” and
“music”, to our set.
• We pass “cat” through the same hash functions and get the following
results.
• Coding has the hash output {125, 67, 19} from the three hash
functions, and as discussed above, the indexes {5, 7, 9} are set to 1.
• Music has the hash output {290, 145, 2} and the indexes {0, 2, 5} are
set to 1.
• We pass “cat” through the same hash functions and get the following
results.
• Coding has the hash output {125, 67, 19} from the three hash
functions, and as discussed above, the indexes {5, 7, 9} are set to 1.
• Music has the hash output {290, 145, 2} and the indexes {0, 2, 5} are
set to 1.
h1(“cat”) = 233
h2(“cat”) = 155
h3(“cat”) = 9
• So, we check if the indexes {3, 5, 9} are all set to 1.
• As we can see, even though indexes 5 and 9 are set to 1, 3 is not.
• Thus, we can conclude with 100% certainty that “cat” is not present in
the set.
• Now let’s say we want to check existence of “gaming” in our set.
• We pass it through same hash functions and get the following results.
h1(“gaming”) = 235
h2(“gaming”) = 60
h3(“gaming”) = 22
• We check if the indexes {0, 2, 5} are all set to 1.
• We can see that all of these indexes are set to 1.
• However, we know that “gaming” is not present in the set.
• So, this is a false positive.
Applications of Bloom Filter
• Weak password detection
• Internet Cache Protocol
• Safe browsing in Google Chrome
• Wallet synchronization in Bitcoin
• Hash based IP Traceback
• Cyber security like virus scanning, Worm detection, DDoS prevention
Risky URL detection
• Determining whether a user ID or domain is already taken
• Filtering out previously shown posts on recommendation engines
• Checking words for misspellings and profanity with a spellchecker
• Identifying malicious URLs, blocked IPs, and fraudulent transactions
• Databases: Many popular databases use Bloom filters to reduce the
costly disk lookups for non-existent rows or columns. This technique
is used by PostgreSQL, Apache Cassandra, Cloud Bigtable, etc
Advantages of Bloom filter:
1.It uses constant space, regardless of the number of elements inserted.
2.No false negatives, so you can trust the Bloom filter when it says the
item does not exist.
3.Adding an element never fails.
4.It does not store the actual elements, ensuring privacy out of the box.
Disadvantages of Bloom filter:
1.It can return false positives, so you can’t always trust the Bloom filter
when it says the element exists.
2.Adding elements never fails, but at the cost of an ever-increasing false
positive rate.
3.Reducing false-positive rates requires an additional bit array or
recreation of the Bloom filter.
4.Cannot retrieve the inserted elements.
5.Cannot delete the inserted elements.
THANK YOU ☺

Tycs Sem-6 Information Retrieval (MCQ) Question Bank: Items
100% (2)
Tycs Sem-6 Information Retrieval (MCQ) Question Bank: Items
6 pages
Bloom Filter
No ratings yet
Bloom Filter
9 pages
Chapter 8 - Searching and Hashing
No ratings yet
Chapter 8 - Searching and Hashing
53 pages
Module 4
No ratings yet
Module 4
10 pages
Bloom Filter: Algorithm Description
No ratings yet
Bloom Filter: Algorithm Description
11 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
B.tech Bloom Filter 3
No ratings yet
B.tech Bloom Filter 3
14 pages
C++ Data Structures & Hashing
No ratings yet
C++ Data Structures & Hashing
42 pages
Deep Web Course Discounts
No ratings yet
Deep Web Course Discounts
17 pages
An Enhanced Bloom Filter For Longest Prefix Matching
No ratings yet
An Enhanced Bloom Filter For Longest Prefix Matching
6 pages
SPA Session 13 Streaming Algo Bloom
No ratings yet
SPA Session 13 Streaming Algo Bloom
23 pages
Facebook Page Verification Guide
No ratings yet
Facebook Page Verification Guide
4 pages
Lecture08 BloomFilter
No ratings yet
Lecture08 BloomFilter
2 pages
Agusan Del Sur College Practical Research 2
No ratings yet
Agusan Del Sur College Practical Research 2
11 pages
Amazon Keywords
No ratings yet
Amazon Keywords
9 pages
Implementing DGIM Algorithm
No ratings yet
Implementing DGIM Algorithm
6 pages
Bloom Filters: Differential Files Simple Large Database
No ratings yet
Bloom Filters: Differential Files Simple Large Database
22 pages
Bloom Filters - A Probabilistic Data Structure - LinkedIn
No ratings yet
Bloom Filters - A Probabilistic Data Structure - LinkedIn
7 pages
Blooms Filter
No ratings yet
Blooms Filter
15 pages
ADS EXP 8 Tanisha Kanal
No ratings yet
ADS EXP 8 Tanisha Kanal
10 pages
Data Stream Sampling
No ratings yet
Data Stream Sampling
25 pages
Bda Exp4 Chinmay
No ratings yet
Bda Exp4 Chinmay
4 pages
Data Science 5
No ratings yet
Data Science 5
82 pages
Unit 1 1
No ratings yet
Unit 1 1
63 pages
Website Audit Report
No ratings yet
Website Audit Report
9 pages
Bloom Filters: Efficient Data Structure Guide
No ratings yet
Bloom Filters: Efficient Data Structure Guide
7 pages
Viden Io Data Analytics Lecture7 Data Stream Filtering PDF
No ratings yet
Viden Io Data Analytics Lecture7 Data Stream Filtering PDF
20 pages
Searching
No ratings yet
Searching
18 pages
Bda PT 2
No ratings yet
Bda PT 2
35 pages
Bloom Filter
No ratings yet
Bloom Filter
29 pages
On Implementing Bloom Filters in C - Andreinc
No ratings yet
On Implementing Bloom Filters in C - Andreinc
16 pages
Bloom Filters: Insert (X) : For I in (1, K) : A (H - I (X) ) 1
No ratings yet
Bloom Filters: Insert (X) : For I in (1, K) : A (H - I (X) ) 1
1 page
AdityaGaur BDA Exp7
No ratings yet
AdityaGaur BDA Exp7
2 pages
DSBDA UT 2 Part 2
No ratings yet
DSBDA UT 2 Part 2
21 pages
DGIM
No ratings yet
DGIM
90 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Unit 5 - DSA
No ratings yet
Unit 5 - DSA
14 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Bloom Filters: Presented By: Eman Shafiq (2017-EE-389) Bareera Azhar (2017-EE-379) Ruqia Rubab (2017-EE-383
No ratings yet
Bloom Filters: Presented By: Eman Shafiq (2017-EE-389) Bareera Azhar (2017-EE-379) Ruqia Rubab (2017-EE-383
14 pages
Unit 9 Searching
No ratings yet
Unit 9 Searching
10 pages
Bloom Filters A Tutorial, Analysis, and Survey
No ratings yet
Bloom Filters A Tutorial, Analysis, and Survey
31 pages
Probabilistic Data Structures Guide
No ratings yet
Probabilistic Data Structures Guide
5 pages
Lec 32
No ratings yet
Lec 32
20 pages
DSC++ Unit-Iv
No ratings yet
DSC++ Unit-Iv
30 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
BDA Assignment2 BE6 20
No ratings yet
BDA Assignment2 BE6 20
9 pages
Algorithm Lecture6 Search
No ratings yet
Algorithm Lecture6 Search
40 pages
Streaming Algorithms Overview
No ratings yet
Streaming Algorithms Overview
90 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
CACS201 Unit 9 - Searching
No ratings yet
CACS201 Unit 9 - Searching
29 pages
Image Storage Management
100% (1)
Image Storage Management
29 pages
Unit Nine
No ratings yet
Unit Nine
31 pages
Algo Ds Bloom Typed
No ratings yet
Algo Ds Bloom Typed
8 pages
Bloom Filters: What Is A Bloom Filter?
No ratings yet
Bloom Filters: What Is A Bloom Filter?
7 pages
Bloom Filter & Algorithms Guide
No ratings yet
Bloom Filter & Algorithms Guide
9 pages
Data Structures & Algorithms Guide
No ratings yet
Data Structures & Algorithms Guide
34 pages
Bloom Filters: References
No ratings yet
Bloom Filters: References
22 pages
Deep Packet Inspection Using Parallel Bloom Filters
No ratings yet
Deep Packet Inspection Using Parallel Bloom Filters
8 pages
Rsa 2008
No ratings yet
Rsa 2008
32 pages
Maharashtra Police Bharti 2023 Guide
No ratings yet
Maharashtra Police Bharti 2023 Guide
1 page
Bloomfilter
No ratings yet
Bloomfilter
9 pages
Bloom Filter Guo
No ratings yet
Bloom Filter Guo
90 pages
Streams 2
No ratings yet
Streams 2
49 pages
Bloom Filter PDF
No ratings yet
Bloom Filter PDF
13 pages
Advanced Data Structures Lecture
No ratings yet
Advanced Data Structures Lecture
46 pages
MINT Delhi 21 07 2025
No ratings yet
MINT Delhi 21 07 2025
18 pages
Dictionary 1837 PDF
No ratings yet
Dictionary 1837 PDF
541 pages
Google My Business Mastery
No ratings yet
Google My Business Mastery
35 pages
Akai Akh800xs
No ratings yet
Akai Akh800xs
45 pages
Lecture 4.2 - Search Engine Optimization (SEO) : Digital Marketing Week 4
No ratings yet
Lecture 4.2 - Search Engine Optimization (SEO) : Digital Marketing Week 4
6 pages
How To Search On Internet
No ratings yet
How To Search On Internet
26 pages
Search Engine Strategies 8-04
No ratings yet
Search Engine Strategies 8-04
49 pages
Probing The Network
No ratings yet
Probing The Network
4 pages
RUN-LLP - Resubmission 2
No ratings yet
RUN-LLP - Resubmission 2
1 page
009 Bipolar Disorder PPT Presentation Template and Google Slides Theme For Free
No ratings yet
009 Bipolar Disorder PPT Presentation Template and Google Slides Theme For Free
26 pages
Google Analytics WebXion 1st To 24th July
No ratings yet
Google Analytics WebXion 1st To 24th July
26 pages
Baba Vanga - Google Search
No ratings yet
Baba Vanga - Google Search
1 page
Dominican School of Calabanga: Topic: Online Ethics, Netiquette and Online Search
No ratings yet
Dominican School of Calabanga: Topic: Online Ethics, Netiquette and Online Search
2 pages
3 The Social Media Etech
No ratings yet
3 The Social Media Etech
17 pages
Linkedin Recruiter Cheat Sheet 2019
No ratings yet
Linkedin Recruiter Cheat Sheet 2019
1 page
AngularJS Tables
No ratings yet
AngularJS Tables
3 pages
Google's Name, Logo, and Culture
No ratings yet
Google's Name, Logo, and Culture
1 page
Math Forum - Ask Dr. Math
No ratings yet
Math Forum - Ask Dr. Math
1 page
Shower Spares & Parts Showering
No ratings yet
Shower Spares & Parts Showering
1 page
E Tech Lesson 3 Q. 3
No ratings yet
E Tech Lesson 3 Q. 3
5 pages
Portfolio Design Steps
No ratings yet
Portfolio Design Steps
3 pages
Shaba Parveen Resume
No ratings yet
Shaba Parveen Resume
2 pages

Bloom Filter

Uploaded by

Bloom Filter

Uploaded by

BLOOM FILTER

You might also like