Skip to main content

String Searching Algorithms (Part II)

Posted by potty on May 10, 2012 at 5:00 AM PDT

Knuth-Morris-Pratt (KMP)

Overview

According to Wikipedia, the Knuth-Morris-Pratt algorithm "searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters".



Being 'A' the portion of the pattern that fits with the text, 'B' is part of the text and 'f' is the length of 'A'. The KMP algorithm moves the pattern according to the characters of the pattern that fits with the text.

The main components of the KPM algorithm are the prefix function (Π) and the KMP matcher:

  • The prefix fuction (Π) keeps information about how pattern matches against shifts of itself. This information is to avoid useless shifts of the pattern.
  • The KMP matcher uses the prefix function, the pattern and the text as inputs to find the occurrence of the pattern within the text and returns the number of shifts after the first occurrence.

In conclusion, the KMP algorithm has the following characteristics:

  • The algorithm compares from left to right.
  • The algorithm preprocess the pattern.
  • The preprocessing phase time efficiency is Θ(m).
  • The searching phase time efficiency is Θ(m+n).
  • The algorithm accomplish at most 2n - 1 comparisons.

Advantages

  • Fast.
  • Simple.
  • It is good for processing large files.
  • Avoid comparisons with elements of 'B' that have previously been involved in comparison with some element of the pattern.

Disadvantages

  • Chances of mismatch increases with big sized alphabets.

Java Code

public class KnuthMorrisPratt {
public int[] prekmp(String pattern) {
int[] next = new int[pattern.length()];
int i=0, j=-1;
next[0]=-1;
while (i while (j>=0 && pattern.charAt(i)!=pattern.charAt(j))
j = next[j];
i++;
j++;
next[i] = j;
}
return next;
}

public int kmp(String text, String pattern) {
int[] next = prekmp(pattern);
int i=0, j=0;
while (i while (j>=0 && text.charAt(i)!=pattern.charAt(j))
j = next[j];
i++; j++;
if (j==pattern.length()) return i-pattern.length();
}
return -1;
}

}

public class Test {
public static void main(String[] args) {
KnuthMorrisPratt k = new KnuthMorrisPratt();
String text = "Lorem ipsum dolor sit amet";
String pattern = "ipsum";

int first_occur_position = k.kmp(text, pattern);
System.out.println("The text '" + pattern + "' is first found on the "
                                   + first_occur_position + " position.");
}
}

References

  1. Charras, Christian & Lecroq, Thierry. Exact String Matching Algorithms. Rouen University. France. 1997. Link.
  2. Wikipedia, the free encyclopedia (User: Almi). Knuth–Morris–Pratt algorithm. June 26, 2003. Link.
  3. Kumar Mandumula, Kranthi . Knuth-Morris-Pratt Algorithm. India. 2011. Link.
AttachmentSize
knuthmorrispratt.png3.9 KB
knuthmorrispratt.zip3.8 KB