Practical grammar compression based on maximal repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

Research output: Contribution to journalArticle

Abstract

This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts.

Original languageEnglish
Article number103
JournalAlgorithms
Volume13
Issue number4
DOIs
Publication statusPublished - Apr 1 2020

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Numerical Analysis
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Practical grammar compression based on maximal repeats'. Together they form a unique fingerprint.

  • Cite this