Bit-parallel approach to approximate string matching in compressed texts

T. Matsumoto, T. Kida, Masayuki Takeda, A. Shinohara, S. Arikawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m - k)(k + 1) ≤ L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k2(∥D∥ + |S|) + km) time using O(k2∥D∥) space, where ∥D∥ is the size of dictionary D and |S| is the number of tokens in the sequence S. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard ∥D∥ + |S| as the compressed length, the time and space complexities are O(k2n + km) and O(k2n), respectively. For general k and m, they become O(k3mn/L + km) and O(k3mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Kärkkäinen, et al. (2000), which runs in O(km) time using O(kmn) space, when k = O(√L).

Original languageEnglish
Title of host publicationProceedings - 7th International Symposium on String Processing and Information Retrieval, SPIRE 2000
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages221-228
Number of pages8
ISBN (Electronic)0769507468, 9780769507460
DOIs
Publication statusPublished - 2000
Event7th International Symposium on String Processing and Information Retrieval, SPIRE 2000 - A Curuna, Spain
Duration: Sep 27 2000Sep 29 2000

Other

Other7th International Symposium on String Processing and Information Retrieval, SPIRE 2000
Country/TerritorySpain
CityA Curuna
Period9/27/009/29/00

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Signal Processing
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Bit-parallel approach to approximate string matching in compressed texts'. Together they form a unique fingerprint.

Cite this