Back

Understanding the Rabin-Karp Algorithm in Java with Examples

30 Apr 2025
5 min read

The Rabin Karp algorithm is famous in the field of reliable string matching due to its effective hashing technique. This algorithm, which was created in 1987 by Michael O. Rabin and Richard M. Karp, is well-known for its ease of use and efficiency in identifying patterns in lengthy texts. Applications such as search engines, DNA analysis, and plagiarism detection make extensive use of it. Fundamentally, Rabin-Karp transforms strings into hash values so that comparisons can be made more quickly. By dynamically updating hash values, a rolling hash method is added to further optimize efficiency. 

This blog delves into its fundamentals, step-by-step implementation, and practical applications in C++, Java, and Python.

What is the Rabin-Karp Algorithm?

The Rabin Karp algorithm is a method used for finding a specific pattern within a larger text. It works by using hashing, which is a way of converting text into numerical values. Instead of comparing the pattern with every substring directly (which would be slow), it first calculates hash values for the pattern and the substrings of the text.

Why Use Rabin-Karp?

The Rabin Karp algorithm is particularly useful when searching for multiple patterns within a large text, making it efficient for applications like plagiarism detection, search engines, and DNA sequencing. 

Its use of hashing and the rolling hash technique allows for faster pattern matching compared to direct string comparison methods.  Instead of checking every substring individually, it first compares hash values, reducing the number of direct character comparisons. 

Fundamental Concepts

  1. Hashing: This is a process of converting a string (or any data) into a fixed numerical value, called a hash. It helps in quick data retrieval and comparison.
  2. Rolling Hash: Instead of recalculating the entire hash from scratch when shifting a window over a string (such as in substring searches), a rolling hash updates the hash value efficiently by removing the effect of the outgoing character and adding the effect of the incoming character.
  3. Prime Number Selection: Using prime numbers in hashing functions helps reduce the chances of different strings producing the same hash (hash collisions). Primes provide a more even distribution of hash values.

Step-by-Step Explanation Of Rabin Karp Algorithm

1. Preprocessing Phase

Before starting the Rabin Karp algorithm, certain parameters need to be chosen carefully to provide efficient and accurate pattern matching.

2. Choosing Parameters

To compute the hash values, two important parameters are selected:

  • Prime Base (p): This is a constant value used in hash calculations. Common choices include 31 or 256 (suitable for ASCII characters).
  • Large Prime Modulus (q): This is chosen to minimize hash collisions. Generally, it is selected to be greater than the square of the pattern length to ensure uniqueness in hash values.

3. Computing the Pattern Hash

A hash value is computed for the given pattern using a mathematical formula. This formula converts the pattern into a unique numerical representation, making it easier to compare with segments of the text.

4. Precomputing Power Terms

To optimize the hash computation process, the power terms of the base value are calculated in advance. These pre-computed values help speed up the rolling hash computation, reducing redundant calculations as the algorithm progresses.

5. Initial Window Setup

Before scanning the entire text, the hash value for the first window of text is computed. This initial computation sets up the rolling hash technique, allowing efficient comparisons as the window slides across the text.

6. Rolling Hash Calculation

Once the initial window is set, the algorithm slides the window across the text one character at a time. The hash value for each new window is updated efficiently rather than recalculating it from scratch. This process involves three essential steps:

  • Removing the Leading Character’s Contribution: The effect of the outgoing character is subtracted from the current hash value.
  • Multiplying by the Base: The remaining hash value is multiplied by the chosen base to shift positions correctly.
  • Adding the New Trailing Character: The next character in the text is included in the hash calculation, forming a new hash value for the current window.

7. Match Verification

When a match is found between the hash values of the pattern and a text window, it is not guaranteed to be an exact match due to potential hash collisions. Therefore, a direct character-by-character comparison is performed to confirm the match. If the pattern and the text segment are identical the position of the match is recorded.

Rabin Karp Algorithm Implementation in C++

#include <iostream>
#include <vector>

using namespace std;

vector<int> rabin_karp(const string& pattern, const string& text) {
    const int p = 31, mod = 1e9+9;
    int m = pattern.size(), n = text.size();
    long long pattern_hash = 0, text_hash = 0, p_pow = 1;
    vector<int> occurrences;

    // Compute hash values for pattern and the first window of text
    for (int i = 0; i < m; i++) {
        pattern_hash = (pattern_hash + (pattern[i] - 'a' + 1) * p_pow) % mod;
        text_hash = (text_hash + (text[i] - 'a' + 1) * p_pow) % mod;
        if (i < m - 1) p_pow = (p_pow * p) % mod;
    }

    // Check for pattern match in text using rolling hash
    for (int i = 0; i <= n - m; i++) {
        if (pattern_hash == text_hash && text.substr(i, m) == pattern)
            occurrences.push_back(i);

        // Compute hash for next window
        if (i < n - m) {
            text_hash = (text_hash - (text[i] - 'a' + 1) + mod) % mod;
            text_hash = (text_hash * p + (text[i + m] - 'a' + 1)) % mod;
        }
    }
    return occurrences;
}

int main() {
    string text = "ababcabcab";
    string pattern = "abc";

    vector<int> positions = rabin_karp(pattern, text);

    cout << "Pattern found at indices: ";
    for (int pos : positions) {
        cout << pos << " ";
    }
    cout << endl;

    return 0;
}

Explanation

The Rabin Karp algorithm finds all occurrences of a pattern in a given text using hashing. First, it computes the hash values for the pattern and the first substring of the text of the same length. Then, it slides through the text while updating the hash efficiently, checking for matches. If the hash matches, it is verified by direct comparison to confirm an exact match.

Output

Pattern found at indices: 2 5

Time and Space Complexity

  • Time Complexity: Best and average case: O(n + m), worst case: O(nm) (when all hash values collide).
  • Space Complexity: O(1) (constant extra space is used for hashing).

Rabin Karp Algorithm Implementation in Java

public class RabinKarpSearch {
    private static final int PRIME = 101;
    private static final int BASE = 31;

    public static int searchPattern(String pattern, String text) {
        int patternLength = pattern.length();
        int textLength = text.length();
        long patternHash = computeHash(pattern, patternLength);
        long textHash = computeHash(text.substring(0, patternLength), patternLength);

        for (int i = 0; i <= textLength - patternLength; i++) {
            if (patternHash == textHash && text.substring(i, i + patternLength).equals(pattern)) {
                return i;
            }
            if (i < textLength - patternLength) {
                textHash = recalculateHash(text, textHash, i, patternLength);
            }
        }
        return -1;
    }

    private static long computeHash(String str, int length) {
        long hash = 0;
        for (int i = 0; i < length; i++) {
            hash = (hash * BASE + str.charAt(i)) % PRIME;
        }
        return hash;
    }

    private static long recalculateHash(String text, long oldHash, int index, int patternLength) {
        oldHash = (oldHash - text.charAt(index) * (long) Math.pow(BASE, patternLength - 1)) % PRIME;
        oldHash = (oldHash * BASE + text.charAt(index + patternLength)) % PRIME;
        if (oldHash < 0) {
            oldHash += PRIME;
        }
        return oldHash;
    }

    public static void main(String[] args) {
        String text = "ababcabcab";
        String pattern = "abc";
        int result = searchPattern(pattern, text);
        System.out.println("Pattern found at index: " + result);
    }
}

Explanation

This program implements the Rabin Karp algorithm for pattern matching. It computes a rolling hash for efficient substring comparison, checking if the pattern exists in the text. If a match is found, it returns the starting index; otherwise, it returns -1.

Output

Pattern found at index: 2

Time and Space Complexity

  • Time Complexity: For best/Average case: O(n + m) (Efficient due to rolling hash), For worst case: O(nm) (When many hash collisions occur)
  • Space Complexity: O(1) (Only a few extra variables are used)

Rabin Karp Algorithm Implementation in Python

def rabin_karp_search(text, pattern):
    base = 256  # Base value for hash computation (number of possible characters)
    prime = 101  # A prime number to minimize hash collisions
    m, n = len(pattern), len(text)
    result = []
    
    if m == 0 or n < m:
        return result
    
    hash_multiplier = pow(base, m - 1, prime)
    pattern_hash, window_hash = 0, 0
    
    # Compute initial hash values
    for i in range(m):
        pattern_hash = (base * pattern_hash + ord(pattern[i])) % prime
        window_hash = (base * window_hash + ord(text[i])) % prime
    
    # Slide through the text
    for i in range(n - m + 1):
        if pattern_hash == window_hash:
            if text[i:i + m] == pattern:
                result.append(i)
        
        if i < n - m:
            window_hash = (base * (window_hash - ord(text[i]) * hash_multiplier) + ord(text[i + m])) % prime
            if window_hash < 0:
                window_hash += prime
    
    return result

# Example usage
text_sample = "ABCCDABCDABCD"
pattern_sample = "ABCD"
print(rabin_karp_search(text_sample, pattern_sample))

Explanation

The Rabin Karp algorithm is a string-searching technique that uses hash values to find occurrences of a pattern in a text efficiently. It first calculates hash values for the pattern and the first window of text then slides through the text while updating the hash dynamically. If the hash values match, a direct character comparison confirms the match.

Output

[4, 8]

Time and Space Complexity

  • Time Complexity: For best and average case: O(n+m)O(n + m)O(n+m) (efficient hashing minimizes spurious matches). For Worst case: O(nm)O(nm)O(nm) (if hash collisions occur frequently)
  • Space Complexity: O(1) (only a few extra variables are used, independent of input size)

Conclusion

The Rabin Karp algorithm is an efficient pattern-matching technique that utilizes hashing and the rolling hash method to search for patterns in a given text. It is useful for applications like plagiarism detection, DNA sequencing, and search engines due to its ability to handle multiple pattern searches efficiently. 

While its average time complexity is linear, it can slow down in cases of excessive hash collisions. Instead of this limitation, its balance between simplicity and performance makes it a highly used algorithm for string-searching tasks.

Frequently Asked Questions

1. How does the Rabin-Karp algorithm handle collisions?

When two different substrings have the same hash (a collision), the algorithm performs a direct character-by-character comparison to verify if it's a real match.

2. What is the time complexity of Rabin-Karp?

On average, it runs in O(n + m) time, where n is the text length and m is the pattern length. However, in the worst case, it can take O(nm) time.

3. How does the rolling hash work in Rabin-Karp?

Instead of recalculating the hash from scratch, the rolling hash updates it efficiently when moving to the next substring. It removes the first character’s contribution and adds the new character at the end.

4. Where is the Rabin-Karp algorithm used?

It’s commonly used in plagiarism detection, DNA sequence matching, data deduplication, and network security.

5. Why does the algorithm use a prime number for hashing?

A large prime number helps spread hash values more evenly, reducing hash collisions and improving efficiency.

6. Can Rabin-Karp search for multiple patterns at once?

Yes! By storing multiple hash values, Rabin-Karp can search for multiple patterns in one pass through the text, making it useful for large-scale searches.

Read More Articles

Chat with us
Chat with us
Talk to career expert