Rabin-Karp Algorithm
#string #pattern-matching
The Rabin-Karp Algorithm is a string search algorithm that uses hashing to find
occurrences of a pattern in a text. It is particularly efficient when there are multiple
patterns to search for or when checking a large text for multiple occurrences of a pattern.
Key Idea
The main idea behind Rabin-Karp is to compute a hash value for the pattern and
compare it with the hash values of substrings of the text. Instead of recalculating the
hash of each substring from scratch, a rolling hash technique is used to update the hash
as the window slides over the text.
Steps of the Algorithm
1. Compute the hash of the pattern:
Calculate the hash for the given pattern using a hash function.
2. Sliding window approach:
Use a sliding window to extract substrings of the text with the same length as the
pattern.
3. Compute the hash of each substring:
Calculate the hash for each substring of the text using the rolling hash technique.
4. Compare the hashes:
If the hash of a substring matches the hash of the pattern, compare the actual
characters to avoid hash collisions.
5. Return results:
Return the starting indices of all matches.
Rolling Hash
To compute the hash of a string S of length mm, we use the formula:
m−1 m−2
hash(S) = (S[0] ⋅ d + S[1] ⋅ d + ⋯ + S[m − 1]) mod q
Where:
d is the base (typically a large number like 256 for character encoding).
q is a large prime number to minimize collisions.
S[i] represents the i-th character of the string S.
Rolling Hash Update
To update the hash for the next substring of length m, we use the rolling hash formula:
m−1
newH ash = (d ⋅ (oldH ash − S[i] ⋅ d ) + S[i + m]) mod q
Where:
S[i] is the character that is being "removed" from the window.
S[i + m] is the new character being added to the window.
Time Complexity
Average Case:
The average time complexity is O(n + m), where nn is the length of the text and mm
is the length of the pattern. This is due to the efficient use of rolling hash.
Worst Case:
The worst-case time complexity can be O(n ⋅ m) due to hash collisions. In such cases,
we would have to verify the actual characters when hashes match.
Space Complexity
Space Complexity:
The space complexity is O(1) if we don't use extra storage for the hashes or other
structures. However, if we need to store multiple hashes (for multiple patterns), the
space complexity could be O(k), where k is the number of patterns.