MR-RePair: Grammar Compression Based on Maximal Repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

Original languageEnglish
Title of host publicationProceedings - DCC 2019
Subtitle of host publication2019 Data Compression Conference
EditorsJoan Serra-Sagrista, Ali Bilgin, Michael W. Marcellin, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages508-517
Number of pages10
ISBN (Electronic)9781728106571
DOIs
Publication statusPublished - May 10 2019
Event2019 Data Compression Conference, DCC 2019 - Snowbird, United States
Duration: Mar 26 2019Mar 29 2019

Publication series

NameData Compression Conference Proceedings
Volume2019-March
ISSN (Print)1068-0314

Conference

Conference2019 Data Compression Conference, DCC 2019
CountryUnited States
CitySnowbird
Period3/26/193/29/19

Fingerprint

Repair

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Cite this

Furuya, I., Takagi, T., Nakashima, Y., Inenaga, S., Bannai, H., & Kida, T. (2019). MR-RePair: Grammar Compression Based on Maximal Repeats. In J. Serra-Sagrista, A. Bilgin, M. W. Marcellin, & J. A. Storer (Eds.), Proceedings - DCC 2019: 2019 Data Compression Conference (pp. 508-517). [8712661] (Data Compression Conference Proceedings; Vol. 2019-March). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DCC.2019.00059

MR-RePair : Grammar Compression Based on Maximal Repeats. / Furuya, Isamu; Takagi, Takuya; Nakashima, Yuto; Inenaga, Shunsuke; Bannai, Hideo; Kida, Takuya.

Proceedings - DCC 2019: 2019 Data Compression Conference. ed. / Joan Serra-Sagrista; Ali Bilgin; Michael W. Marcellin; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2019. p. 508-517 8712661 (Data Compression Conference Proceedings; Vol. 2019-March).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Furuya, I, Takagi, T, Nakashima, Y, Inenaga, S, Bannai, H & Kida, T 2019, MR-RePair: Grammar Compression Based on Maximal Repeats. in J Serra-Sagrista, A Bilgin, MW Marcellin & JA Storer (eds), Proceedings - DCC 2019: 2019 Data Compression Conference., 8712661, Data Compression Conference Proceedings, vol. 2019-March, Institute of Electrical and Electronics Engineers Inc., pp. 508-517, 2019 Data Compression Conference, DCC 2019, Snowbird, United States, 3/26/19. https://doi.org/10.1109/DCC.2019.00059
Furuya I, Takagi T, Nakashima Y, Inenaga S, Bannai H, Kida T. MR-RePair: Grammar Compression Based on Maximal Repeats. In Serra-Sagrista J, Bilgin A, Marcellin MW, Storer JA, editors, Proceedings - DCC 2019: 2019 Data Compression Conference. Institute of Electrical and Electronics Engineers Inc. 2019. p. 508-517. 8712661. (Data Compression Conference Proceedings). https://doi.org/10.1109/DCC.2019.00059
Furuya, Isamu ; Takagi, Takuya ; Nakashima, Yuto ; Inenaga, Shunsuke ; Bannai, Hideo ; Kida, Takuya. / MR-RePair : Grammar Compression Based on Maximal Repeats. Proceedings - DCC 2019: 2019 Data Compression Conference. editor / Joan Serra-Sagrista ; Ali Bilgin ; Michael W. Marcellin ; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 508-517 (Data Compression Conference Proceedings).
@inproceedings{21b5bef1a8b44ff3ba4d1eea07712f48,
title = "MR-RePair: Grammar Compression Based on Maximal Repeats",
abstract = "We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.",
author = "Isamu Furuya and Takuya Takagi and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Takuya Kida",
year = "2019",
month = "5",
day = "10",
doi = "10.1109/DCC.2019.00059",
language = "English",
series = "Data Compression Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "508--517",
editor = "Joan Serra-Sagrista and Ali Bilgin and Marcellin, {Michael W.} and Storer, {James A.}",
booktitle = "Proceedings - DCC 2019",
address = "United States",

}

TY - GEN

T1 - MR-RePair

T2 - Grammar Compression Based on Maximal Repeats

AU - Furuya, Isamu

AU - Takagi, Takuya

AU - Nakashima, Yuto

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Kida, Takuya

PY - 2019/5/10

Y1 - 2019/5/10

N2 - We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

AB - We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

UR - http://www.scopus.com/inward/record.url?scp=85066340305&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066340305&partnerID=8YFLogxK

U2 - 10.1109/DCC.2019.00059

DO - 10.1109/DCC.2019.00059

M3 - Conference contribution

AN - SCOPUS:85066340305

T3 - Data Compression Conference Proceedings

SP - 508

EP - 517

BT - Proceedings - DCC 2019

A2 - Serra-Sagrista, Joan

A2 - Bilgin, Ali

A2 - Marcellin, Michael W.

A2 - Storer, James A.

PB - Institute of Electrical and Electronics Engineers Inc.

ER -