# String Searching Algorithms (Part I)

Posted by potty on April 19, 2012 at 7:07 AM PDT

# String searching algorithm (SSA)

According to Wikipedia, string searching algorithms are "an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text".

Some important concepts that you need to understand before reading this entry are pattern, string and alphabet:

• Pattern: Set of elements that represent a predictable manner.
• Alphabet: Set of symbols. For example human alphabet (A~Z), binary alphabet (1,0) and DNA alphabet (A,C,G,T).
• String: Finite set of elements chose from an alphabet.

There are multiple categories of SSA. They basically depend on the number of patterns you chose. For example, there is single pattern, finite set of patterns and infinite set of patterns algorithms. The most common single pattern SSA are:

• Brute-force searching algorithm.
• Knuth-Morris-Pratt algorithm.
• Boyer-Moore algorithm.

In this entry (and next two entries) I will expose all the occurrences of a pattern within a text using a single pattern Java code (tested using Oracle JDK and OpenJDK). For convention, I used m as pattern size and n as text size.

## Brute-force searching algorithm (BFSA)

### Overview

According to Wikipedia brute-force search is "a trivial but very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's statement".

The BFSA aligns the first position of the pattern with the first position of the text and the algorithm compares their characters one by one until all pattern characters are found or a mismatch is detected.

In conclusion, the BFSA has the following characteristics:

• The pattern is not preprocessed.
• The algorithm compares from left to right character by character.
• The worst time efficiency is ?(mn) comparisons.
• The algorithm returns the first occurrence of the pattern.

• Simple.
• Rapid approach.
• It is widely used.

• Too slow.
• Not scalable.

### Java Code

``public class BruteForceSearch {	private char[] text;	private char[] pattern;	private int n;	private int m;		public void setString(String t, String p) {		this.text = t.toCharArray();		this.pattern = p.toCharArray();		this.n = t.length();		this.m = p.length();	}		public int search() {		for (int i = 0; i < n-m; i++) {			int j = 0;			while (j < m && text[i+j] == pattern[j]) {				j++;			}			if (j == m) return i;		}		return -1;	}}class Test {	public static void main(String[] args) {		BruteForceSearch bfs = new BruteForceSearch();		String text = "Lorem ipsum dolor sit amet";		String pattern = "ipsum";				bfs.setString(text, pattern);		int first_occur_position = bfs.search();		System.out.println("The text '" + pattern + "' is first found after the " + first_occur_position + " position.");	}}``

## References

1. Charras, Christian & Lecroq, Thierry. Exact String Matching Algorithms. Rouen University. France. 1997. Link.
2. Wikipedia, the free encyclopedia (User: Stevertigo). Brute-force search. October 13, 2002. Link.
3. Wikipedia, the free encyclopedia (User: Kragen). String searching algorithm. October 30, 2001. Link.
4. Levitin, Anany. Introduction to the Design & Analysis of Algorithms. Pearson Addison-Wesley. Third Chapter. Second Edition. 2007.
AttachmentSize
bruteforcesearch.zip3.7 KB
bruteforcesearch.png9.51 KB
Related Topics >>