Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
104 views2 pages

Exact String Matching Using Suffix Trees

Uploaded by

pavithra.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views2 pages

Exact String Matching Using Suffix Trees

Uploaded by

pavithra.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Exact String Matching using Suffix Trees is a method to efficiently find all occurrences of

a pattern within a text using a pre-built suffix tree of the text. The algorithm leverages the
suffix tree to achieve O(m)O(m)O(m) time complexity for searching, where mmm is the
length of the pattern.

Steps for Exact String Matching with Suffix Tree

1. Build the Suffix Tree:


o Create a suffix tree for the given text TTT.
o Append a special character (e.g., $) to TTT to ensure no suffix is a prefix of
another.
2. Search for the Pattern:
o Start at the root of the suffix tree and attempt to match the pattern PPP along
the edges.
o Follow edges in the tree based on the characters of PPP. If the pattern is found:
 Return the positions of all leaf nodes beneath the matched node.
o If the pattern is not found:
 Terminate and return no matches.

Example

Input:

 Text: BANANA
 Pattern: ANA

Step 1: Build the Suffix Tree

For the text BANANA$, the suffixes are:

 BANANA$
 ANANA$
 NANA$
 ANA$
 NA$
 A$
 $

The suffix tree looks like this (compressed for clarity):

ruby
Copy code
(root)
├── B → ANANA$
├── A → NA$
├── N → ANA$
├── $
Step 2: Search for the Pattern ANA

1. Start at the root and follow the edges labeled with the characters of ANA.
o Match A → Follow the edge A.
o Match NA → Continue matching along the edge NA$.
2. After matching the entire pattern ANA:
o The path ends at an internal node.
o Collect all leaf nodes below this node to get the starting positions of matches.

Result:

 Leaf nodes below the matched node correspond to suffixes starting at indices 1 and 3.
 Matches found at positions 1 and 3 (0-based index) in the text BANANA.

Advantages

1. Efficient Search: Once the suffix tree is built, searching for a pattern is
O(m)O(m)O(m), where mmm is the pattern length.
2. Multiple Matches: The tree naturally stores all occurrences of a pattern in the text.

Applications

1. Text Search: Quickly locate substrings in large texts.


2. Plagiarism Detection: Find repeated or matching segments across documents.
3. Bioinformatics: Search for DNA or protein sequences in genomic data.
4. Data Compression: Detect repeated patterns for compression algorithms.

Time Complexity

1. Suffix Tree Construction: O(n)O(n)O(n), where nnn is the length of the text (using
Ukkonen’s Algorithm).
2. Search: O(m)O(m)O(m), where mmm is the length of the pattern.

Suffix trees provide a robust method for string matching, especially in applications requiring
repetitive queries over a large dataset.

You might also like