Collage system: A unifying framework for compressed pattern matching

Takuya Kida, Tetsuya Matsumoto, Yusuke Shibata, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa

Research output: Contribution to journalArticle

43 Citations (Scopus)

Abstract

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D||+|S|)·height(D)+m2+r) time with O(||D||+m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D||+|S|+m2+r).

Original languageEnglish
Pages (from-to)253-272
Number of pages20
JournalTheoretical Computer Science
Volume298
Issue number1
DOIs
Publication statusPublished - Apr 4 2003

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this