0% found this document useful (0 votes)

5 views123 pages

04 3 Hashing Search Substring

The document discusses substring searching algorithms, focusing on the naive approach and Rabin-Karp's algorithm. It outlines how to find all occurrences of a substring in a given text and presents the running time complexities of these methods. The Rabin-Karp algorithm improves efficiency by using hashing to reduce the number of direct comparisons needed.

Uploaded by

gammingencoded

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views123 pages

04 3 Hashing Search Substring

Uploaded by

gammingencoded

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 123

Hashing:

Substring Search

Michael Levin
Department of Computer Science and Engineering
University of California, San Diego

Data Structures Fundamentals

Algorithms and Data Structures
Outline

1 Find Substring in Text

2 Rabin-Karp’s Algorithm

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Searching for Substring
Given a text T (website, book, Amazon
product page) and a string P (word, phrase,
sentence), find all occurrences of P in T.
Searching for Substring
Given a text T (website, book, Amazon
product page) and a string P (word, phrase,
sentence), find all occurrences of P in T.

Examples
Specific term in Wikipedia article
Searching for Substring
Given a text T (website, book, Amazon
product page) and a string P (word, phrase,
sentence), find all occurrences of P in T.

Examples
Specific term in Wikipedia article
Gene in a genome
Searching for Substring
Given a text T (website, book, Amazon
product page) and a string P (word, phrase,
sentence), find all occurrences of P in T.

Examples
Specific term in Wikipedia article
Gene in a genome
Detect files infected by virus — code
patterns
Substring Notation
Definition
Denote by S[i..j] the substring of string S
starting in position i and ending in position j.

Examples
If S =“hashing”, then
S[0..3] =“hash”,
S[4..6] =“ing”,
S[2..5] =“shin”.
Find Substring in String
Input: Strings T and P.
Output: All such positions i in T,
0 ≤ i ≤ |T| − |P| that
T[i..i + |P| − 1] = P.
Naive Algorithm

For each position i from 0 to |T| − |P|, check

whether T[i..i + |P| − 1] = P or not.
If yes, append i to the result.
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
AreEqual(S1, S2)
if |S1| ̸= |S2|:
return False
for i from 0 to |S1| − 1:
if S1[i] ̸= S2[i]:
return False
return True
FindSubstringNaive(T, P)
positions ← empty list
for i from 0 to |T| − |P|:
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
FindSubstringNaive(T, P)
positions ← empty list
for i from 0 to |T| − |P|:
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
FindSubstringNaive(T, P)
positions ← empty list
for i from 0 to |T| − |P|:
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
FindSubstringNaive(T, P)
positions ← empty list
for i from 0 to |T| − |P|:
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
FindSubstringNaive(T, P)
positions ← empty list
for i from 0 to |T| − |P|:
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
FindSubstringNaive(T, P)
positions ← empty list
for i from 0 to |T| − |P|:
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
Running Time
Lemma
Running time of FindPatternNaive(T, P)
is O(|T||P|).
Running Time
Lemma
Running time of FindPatternNaive(T, P)
is O(|T||P|).

Proof
Each AreEqual call is O(|P|)
Running Time
Lemma
Running time of FindPatternNaive(T, P)
is O(|T||P|).

Proof
Each AreEqual call is O(|P|)
|T| − |P| + 1 calls of AreEqual total to
O((|T| − |P| + 1)|P|) = O(|T||P|)
Bad Example
T =“aaa. . . . . . aa” (very long)
P =“aaa. . . ab” (much shorter than T)
Bad Example
T =“aaa. . . . . . aa” (very long)
P =“aaa. . . ab” (much shorter than T)
For each position i in T from 0 to |T| − |P|,
the call to AreEqual has to make all |P|
comparisons, because the difference is always
in the last character.
Bad Example
T =“aaa. . . . . . aa” (very long)
P =“aaa. . . ab” (much shorter than T)
For each position i in T from 0 to |T| − |P|,
the call to AreEqual has to make all |P|
comparisons, because the difference is always
in the last character.
Thus, in this case the naive algorithm runs in
time Θ(|T||P|).
Outline

1 Find Substring in Text

2 Rabin-Karp’s Algorithm

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Rabin-Karp’s Algorithm

Compare P with all substrings S of T of

length |P|
Rabin-Karp’s Algorithm

Compare P with all substrings S of T of

length |P|
Idea: use hashing to make the
comparisons faster
Comparing Hashes
If h(P) ̸= h(S), then definitely P ̸= S
Comparing Hashes
If h(P) ̸= h(S), then definitely P ̸= S
If h(P) = h(S), call AreEqual(P, S) to
check whether P = S or not
Comparing Hashes
If h(P) ̸= h(S), then definitely P ̸= S
If h(P) = h(S), call AreEqual(P, S) to
check whether P = S or not
Use polynomial hash family Pp with
prime p
Comparing Hashes
If h(P) ̸= h(S), then definitely P ̸= S
If h(P) = h(S), call AreEqual(P, S) to
check whether P = S or not
Use polynomial hash family Pp with
prime p
If P ̸= S, the probability
Pr[h(P) = h(S)] of collision is at most
|P|
p for polynomial hashing — can be
made small by choosing very large
prime p
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
for i from 0 to |T| − |P|:
tHash ← PolyHash(T[i..i + |P| − 1], p, x)
if pHash ̸= tHash:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
False Alarms
“False alarm” is the event when P is
compared with a substring S of T, but
P ̸= S.
False Alarms
“False alarm” is the event when P is
compared with a substring S of T, but
P ̸= S.
|P|
The probability of “false alarm” is at most p
False Alarms
“False alarm” is the event when P is
compared with a substring S of T, but
P ̸= S.
|P|
The probability of “false alarm” is at most p

On average, the total number of “false

alarms” will be (|T|−|P|+1)|P|
p , which can be
made small by selecting p ≫ |T||P|.
Running Time without AreEqual

h(P) is computed in O(|P|)

Running Time without AreEqual

h(P) is computed in O(|P|)

h(T[i..i + |P| − 1]) is computed in
O(|P|), |T| − |P| + 1 times
Running Time without AreEqual

h(P) is computed in O(|P|)

h(T[i..i + |P| − 1]) is computed in
O(|P|), |T| − |P| + 1 times
O(|P|) + O((|T| − |P| + 1)|P|) =
O(|T||P|)
AreEqual Running Time

AreEqual is computed in O(|P|)

AreEqual Running Time

AreEqual is computed in O(|P|)

AreEqual is called only when
h(P) = h(T[i..i + |P| − 1]), meaning
that either an occurrence of P is found
or a “false alarm” happened
AreEqual Running Time

AreEqual is computed in O(|P|)

AreEqual is called only when
h(P) = h(T[i..i + |P| − 1]), meaning
that either an occurrence of P is found
or a “false alarm” happened
By selecting p ≫ |T||P| we make the
number of “false alarms” negligible
Total Running Time

If P is found q times in T, then total

time spent in AreEqual is on average
O((q + (|T|−|P|+1)|P|
p )|P|) = O(q|P|) for
p ≫ |T||P|
Total Running Time

If P is found q times in T, then total

time spent in AreEqual is on average
O((q + (|T|−|P|+1)|P|
p )|P|) = O(q|P|) for
p ≫ |T||P|
Total running time is on average
O(|T||P|) + O(q|P|) = O(|T||P|) as
q ≤ |T|
Analysis
O(|T||P|) is the same as running time
of the Naive algorithm, but it can be
improved!
Analysis
O(|T||P|) is the same as running time
of the Naive algorithm, but it can be
improved!
The second summand O(q|P|) is
unavoidable as we need to check each
of the q occurrences of |P| in |T|
Analysis
O(|T||P|) is the same as running time
of the Naive algorithm, but it can be
improved!
The second summand O(q|P|) is
unavoidable as we need to check each
of the q occurrences of |P| in |T|
The first summand O(|T||P|) is so big
because we compute hash of each
substring of |T| separately
Analysis
O(|T||P|) is the same as running time
of the Naive algorithm, but it can be
improved!
The second summand O(q|P|) is
unavoidable as we need to check each
of the q occurrences of |P| in |T|
The first summand O(|T||P|) is so big
because we compute hash of each
substring of |T| separately
This can be optimized — see next video
Outline

1 Find Substring in Text

2 Rabin-Karp’s Algorithm

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Idea
Polynomial hash:
|S|−1
∑
h(S) = S[i]xi mod p
i=0
Idea
Polynomial hash:
|S|−1
∑
h(S) = S[i]xi mod p
i=0

Idea: polynomial hashes of two consecutive

substrings of T are very similar
Idea
Polynomial hash:
|S|−1
∑
h(S) = S[i]xi mod p
i=0

Idea: polynomial hashes of two consecutive

substrings of T are very similar
For each i, denote h(T[i..i + |P| − 1]) by H[i]
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") =
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 1 x x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 2x 7x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
h("eac") =
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
h("eac") = 1 x x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
h("eac") = 4 0 2x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
h("eac") = 4 + 0+2x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
·x ·x

h("eac") = 4 + 0+2x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
·x ·x

h("eac") = 4 + 0+2x2
H[2] = h("ach") = 0 + 2x + 7x2
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
·x ·x

h("eac") = 4 + 0+2x2
H[2] = h("ach") = 0 + 2x + 7x2
H[1] = h("eac") = 4 + 0x + 2x2 =
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
·x ·x

h("eac") = 4 + 0+2x2
H[2] = h("ach") = 0 + 2x + 7x2
H[1] = h("eac") = 4 + 0x + 2x2 =
= 4 + x(0 + 2x) =
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
·x ·x

h("eac") = 4 + 0+2x2
H[2] = h("ach") = 0 + 2x + 7x2
H[1] = h("eac") = 4 + 0x + 2x2 =
= 4 + x(0 + 2x) =
= 4 + x(0 + 2x + 7x2) − 7x3 =
Consecutive substrings
T= b e a c h
encode(T) = 1 4 0 2 7 |P| = 3
h("ach") = 0 +2x+7x2
·x ·x

h("eac") = 4 + 0+2x2
H[2] = h("ach") = 0 + 2x + 7x2
H[1] = h("eac") = 4 + 0x + 2x2 =
= 4 + x(0 + 2x) =
= 4 + x(0 + 2x + 7x2) − 7x3 =
= xH[2] + 4 − 7x3
Recurrence Equation for H[i]
∑
i+|P|
H[i + 1] = T[j]xj−i−1 mod p
j=i+1
Recurrence Equation for H[i]
∑
i+|P|
H[i + 1] = T[j]xj−i−1 mod p
j=i+1
∑
i+|P|−1
H[i] = T[j]xj−i mod p =
j=i
Recurrence Equation for H[i]
∑
i+|P|
H[i + 1] = T[j]xj−i−1 mod p
j=i+1
∑
i+|P|−1
H[i] = T[j]xj−i mod p =
j=i
∑
i+|P|
= T[j]xj−i + T[i] − T[i + |P|]x|P| mod p =
j=i+1
Recurrence Equation for H[i]
∑
i+|P|
H[i + 1] = T[j]xj−i−1 mod p
j=i+1
∑
i+|P|−1
H[i] = T[j]xj−i mod p =
j=i
∑
i+|P|
= T[j]xj−i + T[i] − T[i + |P|]x|P| mod p =
j=i+1
∑
i+|P|
=x T[j]xj−i−1 + (T[i] − T[i + |P|]x|P| ) mod p
j=i+1
Recurrence Equation for H[i]
∑
i+|P|
H[i + 1] = T[j]xj−i−1 mod p
j=i+1
∑
i+|P|−1
H[i] = T[j]xj−i mod p =
j=i
∑
i+|P|
= T[j]xj−i + T[i] − T[i + |P|]x|P| mod p =
j=i+1
∑
i+|P|
=x T[j]xj−i−1 + (T[i] − T[i + |P|]x|P| ) mod p
j=i+1

H[i] = xH[i + 1] + (T[i] − T[i + |P|]x|P| ) mod p

Using Recurrence Equation
H[i] = xH[i + 1] + (T[i] − T[i + |P|]x|P|) mod p
Using Recurrence Equation
H[i] = xH[i + 1] + (T[i] − T[i + |P|]x|P|) mod p

x|P| can be computed once and saved

Using Recurrence Equation
H[i] = xH[i + 1] + (T[i] − T[i + |P|]x|P|) mod p

x|P| can be computed once and saved

Using this recurrence equation, H[i] can
be computed in O(1) given H[i + 1] and
x|P|
Using Recurrence Equation
H[i] = xH[i + 1] + (T[i] − T[i + |P|]x|P|) mod p

x|P| can be computed once and saved

Using this recurrence equation, H[i] can
be computed in O(1) given H[i + 1] and
x|P|
See next video to learn how this
improves the running time of
Rabin-Karp
Outline

1 Find Substring in Text

2 Rabin-Karp’s Algorithm

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Use Precomputation

Use the recurrence equation to

precompute all hashes of substrings of
|T| of length equal to |P|
Then proceed same way as the original
Rabin-Karp algorithm implementation
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H

O(|P|
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H

O(|P|+|P|
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H

O(|P|+|P|+|T| − |P|)
PrecomputeHashes(T, |P|, p, x)
H ← array of length |T| − |P| + 1
S ← T[|T| − |P|..|T| − 1]
H[|T| − |P|] ← PolyHash(S, p, x)
y←1
for i from 1 to |P|:
y ← (y · x) mod p
for i from |T| − |P| − 1 down to 0:
H[i] ← (xH[i + 1] + T[i] − yT[i + |P|]) mod p
return H

O(|P|+|P|+|T| − |P|)= O(|T| + |P|)

Precomputing H

PolyHash is called once — O(|P|)

x|P| is computed in O(|P|)
All values of H are computed in
O(|T| − |P|)
Total precomputation time O(|T| + |P|)
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ≠ H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
RabinKarp(T, P)
p ← big prime, x ← random(1, p − 1)
positions ← empty list
pHash ← PolyHash(P, p, x)
H ← PrecomputeHashes(T, |P|, p, x)
for i from 0 to |T| − |P|:
if pHash ̸= H[i]:
continue
if AreEqual(T[i..i + |P| − 1], P):
positions.Append(i)
return positions
Improved Running Time
h(P) is computed in O(|P|)
Improved Running Time
h(P) is computed in O(|P|)
PrecomputeHashes in O(|T| + |P|)
Improved Running Time
h(P) is computed in O(|P|)
PrecomputeHashes in O(|T| + |P|)
Total time spent in AreEqual is
O(q|P|) on average (for large enough
prime p), where q is the number of
occurrences of P in T
Improved Running Time
h(P) is computed in O(|P|)
PrecomputeHashes in O(|T| + |P|)
Total time spent in AreEqual is
O(q|P|) on average (for large enough
prime p), where q is the number of
occurrences of P in T
Total running time on average
O(|T| + (q + 1)|P|)
Improved Running Time
h(P) is computed in O(|P|)
PrecomputeHashes in O(|T| + |P|)
Total time spent in AreEqual is
O(q|P|) on average (for large enough
prime p), where q is the number of
occurrences of P in T
Total running time on average
O(|T| + (q + 1)|P|)
Usually q is small, so this is much less
than O(|T||P|)
Conclusion
Hash tables are useful for storing Sets
and Maps
Conclusion
Hash tables are useful for storing Sets
and Maps
Possible to search and modify hash
tables in O(1) on average!
Conclusion
Hash tables are useful for storing Sets
and Maps
Possible to search and modify hash
tables in O(1) on average!
Must use good hash families and
randomization
Conclusion
Hash tables are useful for storing Sets
and Maps
Possible to search and modify hash
tables in O(1) on average!
Must use good hash families and
randomization
Hashes are also useful while working
with strings and texts
Conclusion
Hash tables are useful for storing Sets
and Maps
Possible to search and modify hash
tables in O(1) on average!
Must use good hash families and
randomization
Hashes are also useful while working
with strings and texts
There are many more applications,
including blockchain — see next video!

DS Assignment 1
100% (1)
DS Assignment 1
9 pages
Uncertainty Budget Template
100% (1)
Uncertainty Budget Template
4 pages
Rabin-Karp Algorithm
No ratings yet
Rabin-Karp Algorithm
3 pages
(Bajalinov) Linear-Fractional Programming 1st Edition
100% (5)
(Bajalinov) Linear-Fractional Programming 1st Edition
442 pages
Math Properties for Students
No ratings yet
Math Properties for Students
1 page
Arrays and Strings
No ratings yet
Arrays and Strings
8 pages
Tutorial 6
No ratings yet
Tutorial 6
12 pages
Chapter 3 - Reduction of Multiple Subsystems PDF
No ratings yet
Chapter 3 - Reduction of Multiple Subsystems PDF
28 pages
j2020 A Survey of The Usages of Deep Learning For Natural Language Processing
No ratings yet
j2020 A Survey of The Usages of Deep Learning For Natural Language Processing
21 pages
String Matching
No ratings yet
String Matching
4 pages
Adv Data Structure Chapter - 6
No ratings yet
Adv Data Structure Chapter - 6
15 pages
Rolling Hash (Rabin-Karp Algorithm) : Objective
No ratings yet
Rolling Hash (Rabin-Karp Algorithm) : Objective
4 pages
Unit 4
No ratings yet
Unit 4
27 pages
Basic of Algorithms Analysis: Computational Tractability
No ratings yet
Basic of Algorithms Analysis: Computational Tractability
4 pages
Chap1-2 (IA) Complexity - Examples
No ratings yet
Chap1-2 (IA) Complexity - Examples
167 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
String Matching
No ratings yet
String Matching
16 pages
American Express Data Analyst DSA Interview Questions
No ratings yet
American Express Data Analyst DSA Interview Questions
16 pages
Signals and Systems Lab Manual: University of Engineering & Technology, Taxila
No ratings yet
Signals and Systems Lab Manual: University of Engineering & Technology, Taxila
22 pages
03-Rabinkarp Dfa Bitap
No ratings yet
03-Rabinkarp Dfa Bitap
55 pages
Core Pure 1 - Aiming For A Star Annotated
No ratings yet
Core Pure 1 - Aiming For A Star Annotated
21 pages
Daa Unt 1
No ratings yet
Daa Unt 1
72 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
43 pages
Daaunit5 IT3
No ratings yet
Daaunit5 IT3
21 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
42 pages
BIDMAS Practice and Solutions
No ratings yet
BIDMAS Practice and Solutions
8 pages
Quiz 1 Solutions: and Analysis of Algorithms
No ratings yet
Quiz 1 Solutions: and Analysis of Algorithms
13 pages
Unit 3new
No ratings yet
Unit 3new
21 pages
Final Paper
No ratings yet
Final Paper
7 pages
MIT6 006S20 Final Sol
No ratings yet
MIT6 006S20 Final Sol
21 pages
Final Exam (Solution With Scale)
No ratings yet
Final Exam (Solution With Scale)
4 pages
Algorithm Analysis for CS Students
No ratings yet
Algorithm Analysis for CS Students
7 pages
Se - 32
No ratings yet
Se - 32
9 pages
Algo Lab Project
No ratings yet
Algo Lab Project
9 pages
Efficient String Matching Techniques
No ratings yet
Efficient String Matching Techniques
2 pages
PM - SPM Assignment - Shubham Kothawade - Scaler
No ratings yet
PM - SPM Assignment - Shubham Kothawade - Scaler
210 pages
Algorithm Introduction
No ratings yet
Algorithm Introduction
28 pages
Robin Karp Algorithm For String Matching
No ratings yet
Robin Karp Algorithm For String Matching
13 pages
Cost Estimation Methods Guide
No ratings yet
Cost Estimation Methods Guide
2 pages
Design and Analysis of Algorithms - Assignment #2
No ratings yet
Design and Analysis of Algorithms - Assignment #2
38 pages
CH 11 Powerpoint
No ratings yet
CH 11 Powerpoint
62 pages
Lecture 04 Inaryseachtree
No ratings yet
Lecture 04 Inaryseachtree
20 pages
Automation Chapter 4
No ratings yet
Automation Chapter 4
44 pages
CSE 2012-Design and Analysis of Algorithms Practice Problem Sheet (String Matching Problem)
No ratings yet
CSE 2012-Design and Analysis of Algorithms Practice Problem Sheet (String Matching Problem)
2 pages
Lecture 04
No ratings yet
Lecture 04
18 pages
Exercise 1
No ratings yet
Exercise 1
17 pages
02AlgorithmAnalysis Dönüştürüldü
No ratings yet
02AlgorithmAnalysis Dönüştürüldü
59 pages
CSC323 Sp2016 QB Module 1 Efficiency of Algorithms
No ratings yet
CSC323 Sp2016 QB Module 1 Efficiency of Algorithms
14 pages
Topcoder Article
No ratings yet
Topcoder Article
8 pages
Solutions To Set 140
No ratings yet
Solutions To Set 140
18 pages
2IL50 Data Structures: 2018-19 Q3 Lecture 2: Analysis of Algorithms
No ratings yet
2IL50 Data Structures: 2018-19 Q3 Lecture 2: Analysis of Algorithms
39 pages
Rabin-Karp String Matching Algorithm
No ratings yet
Rabin-Karp String Matching Algorithm
11 pages
Quiz 2
No ratings yet
Quiz 2
8 pages
Rabin Karp
No ratings yet
Rabin Karp
11 pages
The Rabin-Karp Algorithm: String Matching
No ratings yet
The Rabin-Karp Algorithm: String Matching
18 pages
Designe and Analysis of Algoritham Mid-Term Equivalent Assignment
No ratings yet
Designe and Analysis of Algoritham Mid-Term Equivalent Assignment
9 pages
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
No ratings yet
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
15 pages
4 - en - MIA - O2.3 - Exp Course 6 - Course Material - Part 4 MP
No ratings yet
4 - en - MIA - O2.3 - Exp Course 6 - Course Material - Part 4 MP
46 pages
Randomized Algorithms
No ratings yet
Randomized Algorithms
12 pages
07 Brute Force
No ratings yet
07 Brute Force
54 pages
DSA MK Lect3 PDF
No ratings yet
DSA MK Lect3 PDF
75 pages
WEKA for Movie Review Analysis
No ratings yet
WEKA for Movie Review Analysis
27 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Rabin Karp
100% (1)
Rabin Karp
13 pages
Chap 1,2,3,5,6 (QA) Upload
No ratings yet
Chap 1,2,3,5,6 (QA) Upload
6 pages
ADDA
No ratings yet
ADDA
50 pages
Homework 03
No ratings yet
Homework 03
2 pages
Algorithm Homework Help
No ratings yet
Algorithm Homework Help
17 pages
HW1 SolutionKey
No ratings yet
HW1 SolutionKey
8 pages
A Two Stage Estimation Method Based On Conceptors Aided Unsup 2023 Expert Sy
No ratings yet
A Two Stage Estimation Method Based On Conceptors Aided Unsup 2023 Expert Sy
17 pages
Combinatorial Optimization Guide
No ratings yet
Combinatorial Optimization Guide
10 pages
Rabin Krap
100% (1)
Rabin Krap
14 pages
Wooldridge 7e Ch03 IM
No ratings yet
Wooldridge 7e Ch03 IM
20 pages
A Survey On Kolmogorov-Arnold Networks
No ratings yet
A Survey On Kolmogorov-Arnold Networks
35 pages
Advanced Differential Equations and Mathematical Modeling
No ratings yet
Advanced Differential Equations and Mathematical Modeling
5 pages
Integration by Substitution Guide
No ratings yet
Integration by Substitution Guide
35 pages
Rabin Karp Matching
No ratings yet
Rabin Karp Matching
11 pages
cs3235 3 PDF
No ratings yet
cs3235 3 PDF
142 pages
Algorithm Exam Help
No ratings yet
Algorithm Exam Help
17 pages
A Branch and Cut Algorithm For The Dial2
No ratings yet
A Branch and Cut Algorithm For The Dial2
25 pages
Ch-2 Digital Image Processing Topics
No ratings yet
Ch-2 Digital Image Processing Topics
36 pages
Self-Rewarding Language Models: Weizhe Yuan Richard Yuanzhe Pang Kyunghyun Cho Sainbayar Sukhbaatar Jing Xu Jason Weston
No ratings yet
Self-Rewarding Language Models: Weizhe Yuan Richard Yuanzhe Pang Kyunghyun Cho Sainbayar Sukhbaatar Jing Xu Jason Weston
15 pages
Designing Optimal Routes in A Liner Shipping Problem
No ratings yet
Designing Optimal Routes in A Liner Shipping Problem
10 pages
The Method of Least Squares: y A BX CX y X y A BX
No ratings yet
The Method of Least Squares: y A BX CX y X y A BX
3 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
46 pages
Statistics - Honours: Paper: CC-4 (Probability and Probability Distributions - II) Full Marks: 50
No ratings yet
Statistics - Honours: Paper: CC-4 (Probability and Probability Distributions - II) Full Marks: 50
2 pages

04 3 Hashing Search Substring

Uploaded by

04 3 Hashing Search Substring

Uploaded by

Hashing:

Data Structures Fundamentals

1 Find Substring in Text

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

For each position i from 0 to |T| − |P|, check

1 Find Substring in Text

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Compare P with all substrings S of T of

Compare P with all substrings S of T of

On average, the total number of “false

h(P) is computed in O(|P|)

h(P) is computed in O(|P|)

h(P) is computed in O(|P|)

AreEqual is computed in O(|P|)

AreEqual is computed in O(|P|)

AreEqual is computed in O(|P|)

If P is found q times in T, then total

If P is found q times in T, then total

1 Find Substring in Text

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Idea: polynomial hashes of two consecutive

Idea: polynomial hashes of two consecutive

H[i] = xH[i + 1] + (T[i] − T[i + |P|]x|P| ) mod p

x|P| can be computed once and saved

x|P| can be computed once and saved

x|P| can be computed once and saved

1 Find Substring in Text

3 Recurrence Equation for Substring Hashes

4 Improving Running Time

Use the recurrence equation to

O(|P|+|P|+|T| − |P|)= O(|T| + |P|)

PolyHash is called once — O(|P|)

You might also like