Make delicious recipes!

Uses of suffix trees

1) Finding a string that occurs more than ‘m’ times in the text

A node (with contents from root to that node as ‘c’) which has ‘k’ leaf level children indicates that ‘c’ occurs ‘k’ times in the text. For example, in the suffix tree above, ‘e’ has 3 leaf-level-children, so it must occur 3 times in ‘peeper’. Similarly, ‘pe’ has 2 leaf level children, so it occurs 2 times in ‘peeper’.

Hence to find a string that occurs more than ‘m’ times in a text, we find all nodes in the trie which have more than ‘m’ leaf level children. For all those nodes, strings from root to those nodes occur more than ‘m’ times.

2) To find all the occurrences of a pattern

We search the pattern normally as we search a given string. If the search terminates at a leaf, then the pattern occurs exactly once, else all its occurrences can be found in the paths leading from that node to all its leaves. To further speed this up, one can link all the leaves of the trie into a linked list and keep first-leaf and last-leaf in every node. To find all occurrences then, we can directly traverse from first-leaf to last-leaf.

3) Common substring among N substrings

To find common substring (not subsequence*) among N strings, following can be done:

1) Append a unique end-of-string marker to each input string like $1, $2, $3 etc.

2) Form a suffix tree by getting suffixes from all the above strings.

3) In this suffix tree, find the node which has the longest depth from the root and its leaf nodes have at least one suffix node in each of the input strings ($1, $2, $3... end-of-string markers help here to determine that the chosen node’s leaves are present in all the strings).

* Elements in a subsequence need not be contiguous like a substring

4) Longest palindrome in a string

Palindrome is a string which reads the same from front to end as it reads from end to front. Example: abcdcba

To find the longest palindrome in string S1, following can be done:

1) Reverse the string S1 to get another string S2

  1. Find the longest repeated substring in S1 and S2 by making a common suffix tree for both the strings.

  2. A naïve solution for this could be to check every point in the string and note the length of the maximum palindrome around it. Order for this solution would be O(n2)

5) Longest Repeated Substring (LRS)

This is a variation of #1 above. We need to find the deepest node which has more than 1 children. The characters from root to that deepest node form the longest repeated substring and the leaves form those suffixes where that LRS is present.

A naive LRS finding algorithm is shown below:

( Note the order of searching LRS is O(n)3 )

Like us on Facebook to remain in touch
with the latest in technology and tutorials!

Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Email: (Your email is not shared with anybody)

Facebook comments:

Site Owner: Sachin Goyal