MR-RePair: Grammar Compression Based on Maximal Repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

Original languageEnglish
Title of host publicationProceedings - DCC 2019
Subtitle of host publication2019 Data Compression Conference
EditorsJoan Serra-Sagrista, Ali Bilgin, Michael W. Marcellin, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages508-517
Number of pages10
ISBN (Electronic)9781728106571
DOIs
Publication statusPublished - May 10 2019
Event2019 Data Compression Conference, DCC 2019 - Snowbird, United States
Duration: Mar 26 2019Mar 29 2019

Publication series

NameData Compression Conference Proceedings
Volume2019-March
ISSN (Print)1068-0314

Conference

Conference2019 Data Compression Conference, DCC 2019
CountryUnited States
CitySnowbird
Period3/26/193/29/19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'MR-RePair: Grammar Compression Based on Maximal Repeats'. Together they form a unique fingerprint.

  • Cite this

    Furuya, I., Takagi, T., Nakashima, Y., Inenaga, S., Bannai, H., & Kida, T. (2019). MR-RePair: Grammar Compression Based on Maximal Repeats. In J. Serra-Sagrista, A. Bilgin, M. W. Marcellin, & J. A. Storer (Eds.), Proceedings - DCC 2019: 2019 Data Compression Conference (pp. 508-517). [8712661] (Data Compression Conference Proceedings; Vol. 2019-March). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DCC.2019.00059