Multiple pattern matching in LZW compressed text

Takuya Kida, Masayuki Takeda, Ayumi Shinohara, Masamichi Miyazaki, Setsuo Arikawa

Research output: Contribution to journalArticle

46 Citations (Scopus)

Abstract

In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern. The new algorithm runs in O(n + m2 + r) time using O(n + m2) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

Original languageEnglish
Pages (from-to)103-112
Number of pages10
JournalUnknown Journal
Publication statusPublished - 1998

Fingerprint

Pattern matching
decompression

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., & Arikawa, S. (1998). Multiple pattern matching in LZW compressed text. Unknown Journal, 103-112.

Multiple pattern matching in LZW compressed text. / Kida, Takuya; Takeda, Masayuki; Shinohara, Ayumi; Miyazaki, Masamichi; Arikawa, Setsuo.

In: Unknown Journal, 1998, p. 103-112.

Research output: Contribution to journalArticle

Kida, T, Takeda, M, Shinohara, A, Miyazaki, M & Arikawa, S 1998, 'Multiple pattern matching in LZW compressed text', Unknown Journal, pp. 103-112.
Kida T, Takeda M, Shinohara A, Miyazaki M, Arikawa S. Multiple pattern matching in LZW compressed text. Unknown Journal. 1998;103-112.
Kida, Takuya ; Takeda, Masayuki ; Shinohara, Ayumi ; Miyazaki, Masamichi ; Arikawa, Setsuo. / Multiple pattern matching in LZW compressed text. In: Unknown Journal. 1998 ; pp. 103-112.
@article{ab9e9e8aac0f46eaa1961b07aa70e26f,
title = "Multiple pattern matching in LZW compressed text",
abstract = "In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern. The new algorithm runs in O(n + m2 + r) time using O(n + m2) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.",
author = "Takuya Kida and Masayuki Takeda and Ayumi Shinohara and Masamichi Miyazaki and Setsuo Arikawa",
year = "1998",
language = "English",
pages = "103--112",
journal = "Quaternary International",
issn = "1040-6182",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Multiple pattern matching in LZW compressed text

AU - Kida, Takuya

AU - Takeda, Masayuki

AU - Shinohara, Ayumi

AU - Miyazaki, Masamichi

AU - Arikawa, Setsuo

PY - 1998

Y1 - 1998

N2 - In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern. The new algorithm runs in O(n + m2 + r) time using O(n + m2) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

AB - In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern. The new algorithm runs in O(n + m2 + r) time using O(n + m2) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

UR - http://www.scopus.com/inward/record.url?scp=0031679356&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031679356&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0031679356

SP - 103

EP - 112

JO - Quaternary International

JF - Quaternary International

SN - 1040-6182

ER -