Matching Strings: An Introduction to Knuth-Morris-Pratt (KMP) Algorithm

Matching Strings: An Introduction to Knuth-Morris-Pratt (KMP) Algorithm

6 mins read2.4K Views Comment
Updated on Jul 7, 2023 13:13 IST

Knuth-Morris-Pratt or KMP is a linear string matching algorithm with low time complexity of O(n+m), where n is the string length and m is the pattern length. In this article, we will what KMP algorithm is, how it works, algorithm for LPP table.

2023_02_MicrosoftTeams-image-187.jpg

In this article, we will understand the working of the KMP string-matching algorithm. We will also solve an example using the KMP algorithm. Before starting this article, we will first understand the need for the KMP algorithm and what were the problems of the naïve string-matching algorithm.

Table of Content

Introduction

KMP is a linear string-matching algorithm with low time complexity. It is the solution to the drawbacks of previous text-matching algorithms. You probably have used the advantages of the string-matching algorithm in your life. For example, while working on Microsoft applications like Microsoft Word, you might use the “Find” option to identify similar words. A pattern-matching algorithm supports the functionality of this Microsoft Word feature. In this article, we will understand the working of the KMP string-matching algorithm. We will also solve an example using the KMP algorithm.

To begin with, we should first understand why the KMP algorithm was needed and what the naive string-matching algorithms had problems with.

Know more about Data Structure and Algorithms

Must Check: Data Structure and Algorithms Online Courses and Certificates

What is the Need for the KMP Algorithm?

To match a pattern and a string an application use string-matching algorithms. The simplest algorithm is the basic or naïve string-matching algorithm.

The basic algorithms have some drawbacks and are not effective enough to be used for string-matching applications.

  • The basic string-matching algorithms consume more time.
  • They use back-tracking whenever there is a mismatch in the string character and a matching pattern character.
  • With the increasing length of the sample pattern and original string, it is not feasible to repeat the matching process whenever there is a mismatch.
  • The repeated process consumes more memory and time.
  • Their time complexity is O(n*m), where n is the length of the text or string and m is the length of the pattern.

To overcome the problems with the naïve algorithm. There was a need for a fruitful and effective string-matching algorithm. This requirement gives birth to the KMP algorithm.

Adjacency Matrix For Graphs
Adjacency Matrix For Graphs
An Adjacency Matrix is a method of representing graphs in matrix form. The adjacency matrix plays a vital role in describing finite graphs, making them easier to understand and compact...read more
Understanding Adjacency List
Understanding Adjacency List
In this article, we will discuss an adjacency list representation of its implementation with an example; we will also discuss applications of an adjacency list along with its advantages and...read more
Difference between Stack and Queue
Difference between Stack and Queue
This article compares stacks and queues in programming. Stacks use LIFO to undo mechanisms and function calls, while queues use FIFO to manage processes, such as printing jobs or CPU...read more

What is KMP Algorithm?

KMP is the fastest string-matching algorithm. Its full form is Knuth Morris Pratt Algorithm. The KMP algorithm got this name from the name of its inventors. In 1970, James H Morris, Vaughan Pratt, and Donald Knuth invented this linear string-matching algorithm.

The KMP is the only string-matching algorithm with a time complexity of O(n+m), where n is the string length and m is the pattern length.

In string-matching algorithms, there are two terminologies: sting or text and pattern.

  • String or text is the original string that is used for matching.
  • The pattern is the sample text which is to be matched.
All About Kruskal’s Algorithm
All About Kruskal’s Algorithm
Kruskal’s algorithm is based on a greedy approach, whose goal is to find the shortest path in a graph with a minimum cost.
Understanding Dijkstra’s Algorithm
Understanding Dijkstra’s Algorithm
Dijkstra’s algorithm or Dijkstra’s shortest path algorithm or single source shortest path algorithm is used in identifying the shortest path from starting node to a destination node in a weighted...read more
Introduction to Bellman Ford Algorithm
Introduction to Bellman Ford Algorithm
Bellman Ford Algorithm is a Dynamic programming-based single source shortest path algorithm. Bellman Ford begins with a vertex and finds the shortest distance between other vertices.

Working of KMP Algorithm with an Example

In this section, we will discuss the working of the KMP algorithm with an example. In the KMP algorithm, an index number is used to track each character of the string and pattern. A variable is used to trace the index numbers.

The matching of each character of text and pattern occurs between two strings.

There is a pie table or Longest Proper Prefix (LPP) table. For making an NPP table, there are two terms known as prefixes and suffixes in a string.

There is a pie table or Longest Proper Prefix (LPP) table. For making an NPP table, there are two terms prefixes, and suffixes in a string.

For example, in a string: 

a b a b

  • The prefix for the above string can be: a, ab, or aba. The prefix excludes the last string character.
  • The suffix for the above string can be: b, ba, or bab. It excludes the first character of the string.

Consider an example to make an LPP table.

String = ababcb

And the LPP table is shown below:

Index value: 1     2    3     4    5

a b a b d
0 0 1 2 0

Sample string

LLP values            

  • We check that, in the string the first “a” comes for the first time so its value is 0.
  • “b” comes the first time in the string so its value is 0.
  • The next character in the string is “a” it is already present with an index value of 1 so its value will be 1.
  • The next character “b” is repeating and it is present at index value 2, so its value will be 2.
  • String character “c” comes the first time in the string, so its value will be 0.

This is how you can make an LPP table by using the same index values for repeating characters.

All About Floyd-Warshall Algorithm
All About Floyd-Warshall Algorithm
Floyd Warshall is a dynamic programming algorithm used to solve all pair shortest path problems. Dynamic Programming is an approach used in data structure and algorithms for solving problems efficiently,...read more
Introduction to Greedy Algorithm
Introduction to Greedy Algorithm
The greedy algorithm in data science is an approach to problem-solving that chooses the best choice based on the circumstances at hand. In this article, we will briefly discuss Greedy...read more
Breadth-First Search Algorithm
Breadth-First Search Algorithm
A graph traversal algorithm called breadth-first search investigates every node in the graph starting with the root node. In this article, we will briefly discuss BFS algorithm.

Working of the Knuth Morris Pratt Algorithm

String = ababcabcab

2023_02_image-152.jpg
2023_02_image-153.jpg

We take two variables “i” and “j” respectively for the index values of the string and pattern. The value of variable j is 0. With every value of variable i, the next value of j will be used for comparison.

  • When i = 1, j= 0.
2023_02_image-154.jpg
2023_02_image-155.jpg
  • When i =2, j = 1.
2023_02_image-156.jpg
2023_02_image-157.jpg
  • When i = 3, j = 2.
2023_02_image-158.jpg
2023_02_image-159.jpg
  • When i = 4, j = 3.
2023_07_Knuth-Morris-Pratt-Algorithm-1.jpg
2023_02_image-161.jpg
  • When i = 5, j = 4.
2023_02_image-162.jpg
2023_04_kmp-algo.jpg
  • So, the value of variable i = 5 and j at 5 denotes the value 0 as per LLP table value of character d. The value of variable i will increase and it does not start from the starting point like in a naïve string-matching algorithm.
  • Now, i = 6, j = 0
2023_02_image-164.jpg
2023_07_Knuth-Morris-Pratt-Algorithm-2.jpg
  • When i = 7, j = 1.
2023_02_image-166.jpg
2023_02_image-167.jpg
  • When i = 8, j = 2.
2023_02_image-168.jpg
2023_02_image-169.jpg
  • It is a mismatch at j = 2. Looking at its value in the LLP table. The value of j = 0 and i will increase to the next value.
  • When i = 9, j = 0.

Comparing the next value with the “i” value.

2023_02_image-170.jpg
  • When i =10, j = 1.
2023_02_image-171.jpg

In this way, we can match strings.

Algorithm for the LLP Table

  • Take two variables i and j to trace the pattern.
  • Initially, i <- 0, and j <- 2.
  • Set LLP [0] = 0. The LLP value of the first character is always 0.
  • if pattern [i] == pattern [j+1]

LLP [q] <- j+1

  • if pattern [i]! = pattern[j+1]

LLP [q] <- 0

Step by Step Knuth Morris Pratt Algorithm

  • Take two variables x and y to trace the index values of the string and pattern respectively.
  • x <- 1, y <- 0, len <- length of the string.
  • While x < string[len]

If string [x] == pattern [y + 1]

It is a match.

x <- x+1 and y <- y+1.

  • else If string [x]!= pattern [y+1]

It is a mismatch.

x <- x+1 and y <- LLP[y]

Conclusion

The KMP algorithm is used in plagiarism, DNA sequencing, checking word spelling, and many more. It is the fastest string-matching algorithm with an O(m+n) time complexity. The KMP algorithm is a little harder to understand than the naïve algorithm.

Contributed By: Sonal Meenu Singh

FAQs

How does the KMP algorithm work?

The KMP algorithm works by precomputing a "failure function" or "longest prefix suffix" array for the pattern string. This array helps determine the number of characters to skip when a mismatch occurs during the matching process. By utilizing this information, the algorithm avoids rechecking previously matched characters.

What is the time complexity of the KMP algorithm?

The time complexity of the KMP algorithm is O(n + m), where n is the length of the text string and m is the length of the pattern string. It achieves linear time complexity by avoiding unnecessary character comparisons during the matching process.

Are there any limitations to the KMP algorithm?

The KMP algorithm is primarily designed for exact string matching. It does not handle approximate or fuzzy matching scenarios, where slight variations in the pattern are allowed. For approximate matching, other algorithms like the Boyer-Moore algorithm or the Rabin-Karp algorithm may be more suitable.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio