Practical grammar compression based on maximal repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

研究成果: ジャーナルへの寄稿学術誌査読

2 被引用数 (Scopus)

抄録

This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts.

本文言語英語
論文番号103
ジャーナルAlgorithms
13
4
DOI
出版ステータス出版済み - 4月 1 2020

!!!All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • 数値解析
  • 計算理論と計算数学
  • 計算数学

フィンガープリント

「Practical grammar compression based on maximal repeats」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル