Context-sensitive grammar transform: Compression and pattern matching

Shirou Maruyama, Yohei Tanaka, Hiroshi Sakamoto, Masayuki Takeda

Research output: Contribution to journalConference article

10 Citations (Scopus)

Abstract

A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.

Original languageEnglish
Pages (from-to)27-38
Number of pages12
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5280 LNCS
DOIs
Publication statusPublished - Dec 31 2008
Event15th International Symposium on String Processing and Information Retrieval, SPIRE 2008 - Melbourne. VIC, Australia
Duration: Nov 10 2008Nov 12 2008

Fingerprint

Context sensitive grammars
Pattern matching
Pattern Matching
Matching Algorithm
Grammar
Compression
Transform
Encoding
Context

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Context-sensitive grammar transform : Compression and pattern matching. / Maruyama, Shirou; Tanaka, Yohei; Sakamoto, Hiroshi; Takeda, Masayuki.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 5280 LNCS, 31.12.2008, p. 27-38.

Research output: Contribution to journalConference article

@article{572a7672c0b445d2bd57b79c32211eca,
title = "Context-sensitive grammar transform: Compression and pattern matching",
abstract = "A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.",
author = "Shirou Maruyama and Yohei Tanaka and Hiroshi Sakamoto and Masayuki Takeda",
year = "2008",
month = "12",
day = "31",
doi = "10.1007/978-3-540-89097-3-5",
language = "English",
volume = "5280 LNCS",
pages = "27--38",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Context-sensitive grammar transform

T2 - Compression and pattern matching

AU - Maruyama, Shirou

AU - Tanaka, Yohei

AU - Sakamoto, Hiroshi

AU - Takeda, Masayuki

PY - 2008/12/31

Y1 - 2008/12/31

N2 - A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.

AB - A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.

UR - http://www.scopus.com/inward/record.url?scp=58049129382&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58049129382&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-89097-3-5

DO - 10.1007/978-3-540-89097-3-5

M3 - Conference article

AN - SCOPUS:58049129382

VL - 5280 LNCS

SP - 27

EP - 38

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -