Faster approximate string matching over compressed text

Gonzalo Navarro, Takuya Kida, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Approximate string matching on compressed text was a problem open during almost a decade. The two existing solutions are very recent. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. In this paper we present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.

Original languageEnglish
Pages (from-to)459-468
Number of pages10
JournalData Compression Conference Proceedings
DOIs
Publication statusPublished - Jan 1 2001

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Cite this

Faster approximate string matching over compressed text. / Navarro, Gonzalo; Kida, Takuya; Takeda, Masayuki; Shinohara, Ayumi; Arikawa, Setsuo.

In: Data Compression Conference Proceedings, 01.01.2001, p. 459-468.

Research output: Contribution to journalArticle

Navarro, Gonzalo ; Kida, Takuya ; Takeda, Masayuki ; Shinohara, Ayumi ; Arikawa, Setsuo. / Faster approximate string matching over compressed text. In: Data Compression Conference Proceedings. 2001 ; pp. 459-468.
@article{eaf5dfb35285407c93f39d467cb8826d,
title = "Faster approximate string matching over compressed text",
abstract = "Approximate string matching on compressed text was a problem open during almost a decade. The two existing solutions are very recent. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. In this paper we present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.",
author = "Gonzalo Navarro and Takuya Kida and Masayuki Takeda and Ayumi Shinohara and Setsuo Arikawa",
year = "2001",
month = "1",
day = "1",
doi = "10.1109/DCC.2001.917177",
language = "English",
pages = "459--468",
journal = "Proceedings of the Data Compression Conference",
issn = "1068-0314",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Faster approximate string matching over compressed text

AU - Navarro, Gonzalo

AU - Kida, Takuya

AU - Takeda, Masayuki

AU - Shinohara, Ayumi

AU - Arikawa, Setsuo

PY - 2001/1/1

Y1 - 2001/1/1

N2 - Approximate string matching on compressed text was a problem open during almost a decade. The two existing solutions are very recent. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. In this paper we present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.

AB - Approximate string matching on compressed text was a problem open during almost a decade. The two existing solutions are very recent. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. In this paper we present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.

UR - http://www.scopus.com/inward/record.url?scp=0035019839&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035019839&partnerID=8YFLogxK

U2 - 10.1109/DCC.2001.917177

DO - 10.1109/DCC.2001.917177

M3 - Article

AN - SCOPUS:0035019839

SP - 459

EP - 468

JO - Proceedings of the Data Compression Conference

JF - Proceedings of the Data Compression Conference

SN - 1068-0314

ER -