We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D||+|S|)·height(D)+m2+r) time with O(||D||+m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D||+|S|+m2+r).
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)