MR-RePair: Grammar Compression Based on Maximal Repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

Original languageEnglish
Title of host publicationProceedings - DCC 2019
Subtitle of host publication2019 Data Compression Conference
EditorsJoan Serra-Sagrista, Ali Bilgin, Michael W. Marcellin, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages508-517
Number of pages10
ISBN (Electronic)9781728106571
DOIs
Publication statusPublished - May 10 2019
Event2019 Data Compression Conference, DCC 2019 - Snowbird, United States
Duration: Mar 26 2019Mar 29 2019

Publication series

NameData Compression Conference Proceedings
Volume2019-March
ISSN (Print)1068-0314

Conference

Conference2019 Data Compression Conference, DCC 2019
CountryUnited States
CitySnowbird
Period3/26/193/29/19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'MR-RePair: Grammar Compression Based on Maximal Repeats'. Together they form a unique fingerprint.

Cite this