Context-sensitive grammar transform: Compression and pattern matching

Shirou Maruyama, Youhei Tanaka, Hiroshi Sakamoto, Masayuki Takeda

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al. [18], and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. [14], which is regarded as one of the best combinations that allows a practically fast search.

Original languageEnglish
Pages (from-to)219-226
Number of pages8
JournalIEICE Transactions on Information and Systems
VolumeE93-D
Issue number2
DOIs
Publication statusPublished - Jan 1 2010

Fingerprint

Context sensitive grammars
Pattern matching

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Context-sensitive grammar transform : Compression and pattern matching. / Maruyama, Shirou; Tanaka, Youhei; Sakamoto, Hiroshi; Takeda, Masayuki.

In: IEICE Transactions on Information and Systems, Vol. E93-D, No. 2, 01.01.2010, p. 219-226.

Research output: Contribution to journalArticle

Maruyama, Shirou ; Tanaka, Youhei ; Sakamoto, Hiroshi ; Takeda, Masayuki. / Context-sensitive grammar transform : Compression and pattern matching. In: IEICE Transactions on Information and Systems. 2010 ; Vol. E93-D, No. 2. pp. 219-226.
@article{509836a665704efa8d8f8475ce0feeb9,
title = "Context-sensitive grammar transform: Compression and pattern matching",
abstract = "A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al. [18], and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. [14], which is regarded as one of the best combinations that allows a practically fast search.",
author = "Shirou Maruyama and Youhei Tanaka and Hiroshi Sakamoto and Masayuki Takeda",
year = "2010",
month = "1",
day = "1",
doi = "10.1587/transinf.E93.D.219",
language = "English",
volume = "E93-D",
pages = "219--226",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "一般社団法人電子情報通信学会",
number = "2",

}

TY - JOUR

T1 - Context-sensitive grammar transform

T2 - Compression and pattern matching

AU - Maruyama, Shirou

AU - Tanaka, Youhei

AU - Sakamoto, Hiroshi

AU - Takeda, Masayuki

PY - 2010/1/1

Y1 - 2010/1/1

N2 - A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al. [18], and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. [14], which is regarded as one of the best combinations that allows a practically fast search.

AB - A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al. [18], and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. [14], which is regarded as one of the best combinations that allows a practically fast search.

UR - http://www.scopus.com/inward/record.url?scp=77950243420&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950243420&partnerID=8YFLogxK

U2 - 10.1587/transinf.E93.D.219

DO - 10.1587/transinf.E93.D.219

M3 - Article

AN - SCOPUS:77950243420

VL - E93-D

SP - 219

EP - 226

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 2

ER -