Make delicious recipes!

Pattern Matching (Knuth Morris Pratt algorithm)


Problem Statement: Find all occurrences of pattern[] of length 'm' in txt[] of length 'n'.

If we do a strcmp at every index of txt, then it would be O(mn).
This can be reduced to O(m+n) by using KMP algorithm.

Solution: KMP algorithm makes use of the match done till now and does not begin searching all over again in case a mismatch is found while searching for a match.

Let lps be defined as Longest Prefix Suffix which indicates how many characters from the beginning match the ending characters of a string.

Example:

If pattern = [A,b,c,A,b,c,1,2], then
lps array is [0,0,0,1,2,3,0,0]


Now, suppose we begin matching pat in txt after constructing the above array.
We continue matching characters in pattern and txt till a mismatch occurs.
Suppose mismatch happens at txt[i] and pattern[j] characters.

At this point, we check the lps at index j.
If its non-zero, then it means that the 0 to lps[j] characters from pattern
are still matching in txt from txt[i-lps[j]] to txt[i] because by
definition, it means that pattern[0] to pattern[lps[j]] characters are
matching from pat[j-lps[j]] to pattern[j]


Java Code:

public class SearchAlgoKMP 
{

    public static void main (String args[])
    {
        String pattern = "AbcAbc12";
        String txt = "AbcAbcAbc12";
        kmp(pattern, txt);
    }

    static void computeLPS(String pattern, int[] lps) 
    {
        int currMaxLPS = 0;
        lps[0] = 0;

        for (int i=1; i < pattern.length();)
        {
            if(pattern.charAt(i) == pattern.charAt(currMaxLPS))
            {
                currMaxLPS++;
                lps[i] = currMaxLPS;
                i++;
            }
            else
            {
                if( currMaxLPS != 0 )
                {
                    // Note that lps[currMaxLPS-1] contains the largest
                    // prefix of pattern that matches the suffix ending
                    // at pattern[currMaxLPS-1]
                    // So we try to match from that prefix (which has to be
                    // smaller than current prefix)
                    // Observe that i is not incrementing here.
                    currMaxLPS = lps[currMaxLPS-1];
                }
                else
                {
                    lps[i] = 0;
                    i++;
                }
            }
        }
    }


    /**
     * KMP is just a re-application of the LPS concept shown above.
     * Its algorithm is very similar to the above one.
     */
    static void kmp(String pattern, String txt) 
    {
        int lps[] = new int[pattern.length()];
        computeLPS (pattern, lps);

        int txtPos=0, patternPos=0;
        while (txtPos < txt.length())
        {
            if(pattern.charAt(patternPos) == txt.charAt(txtPos))
            {
                patternPos++;
                txtPos++;
            }

            if (patternPos == pattern.length())
            {
                System.out.println
                    ("Pattern is found at index " + (txtPos-patternPos));   
                patternPos = lps[patternPos-1];
            }
            else if(pattern.charAt(patternPos) != txt.charAt(txtPos))
            {
                // mismatch occurs after j matches
                if(patternPos != 0)
                    patternPos = lps[patternPos-1];
                else
                    txtPos++;
            }
        }
    }
}

Sample execution:

Example 1:
A b c A b c A (String)
0 0 0 1 2 3 1 (LPS corresponding to it)



Example 2:


Pattern = AbcAbc12
Txt     = AbcAbcAbc12







Creation of LPS

i=1) currMaxLPS=0, lps[1]=0, Ab
i=2) currMaxLPS=0, lps[2]=0, Abc
i=3) currMaxLPS=1, lps[3]=1, AbcA
i=4) currMaxLPS=2, lps[4]=2, AbcAb
i=5) currMaxLPS=3, lps[5]=3, AbcAbc
i=6) currMaxLPS=0, lps[6]=0, AbcAbc1
i=6) currMaxLPS=0, lps[6]=0, AbcAbc1
i=7) currMaxLPS=0, lps[7]=0, AbcAbc12

Final lps = [0,0,0,1,2,3,0,0]










Searching using lps array

txtPos=0, patternPos=0, doneTxt=A, patternMatched=A
txtPos=1, patternPos=1, doneTxt=Ab, patternMatched=Ab
txtPos=2, patternPos=2, doneTxt=Abc, patternMatched=Abc
txtPos=3, patternPos=3, doneTxt=AbcA, patternMatched=AbcA
txtPos=4, patternPos=4, doneTxt=AbcAb, patternMatched=AbcAb
txtPos=5, patternPos=5, doneTxt=AbcAbc, patternMatched=AbcAbc
txtPos=6, patternPos=3, doneTxt=AbcAbcA, patternMatched=AbcA
txtPos=7, patternPos=4, doneTxt=AbcAbcAb, patternMatched=AbcAb
txtPos=8, patternPos=5, doneTxt=AbcAbcAbc, patternMatched=AbcAbc
txtPos=9, patternPos=6, doneTxt=AbcAbcAbc1, patternMatched=AbcAbc1
txtPos=10, patternPos=7, doneTxt=AbcAbcAbc12, patternMatched=AbcAbc12

Pattern is found at index 3









Like us on Facebook to remain in touch
with the latest in technology and tutorials!


Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Name:
Email: (Your email is not shared with anybody)
Comment:

Facebook comments:

Site Owner: Sachin Goyal