MR-RePair: Grammar Compression Based on Maximal Repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

研究成果: 著書/レポートタイプへの貢献会議での発言

抄録

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

元の言語英語
ホスト出版物のタイトルProceedings - DCC 2019
ホスト出版物のサブタイトル2019 Data Compression Conference
編集者Joan Serra-Sagrista, Ali Bilgin, Michael W. Marcellin, James A. Storer
出版者Institute of Electrical and Electronics Engineers Inc.
ページ508-517
ページ数10
ISBN(電子版)9781728106571
DOI
出版物ステータス出版済み - 5 10 2019
イベント2019 Data Compression Conference, DCC 2019 - Snowbird, 米国
継続期間: 3 26 20193 29 2019

出版物シリーズ

名前Data Compression Conference Proceedings
2019-March
ISSN(印刷物)1068-0314

会議

会議2019 Data Compression Conference, DCC 2019
米国
Snowbird
期間3/26/193/29/19

Fingerprint

Repair

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

これを引用

Furuya, I., Takagi, T., Nakashima, Y., Inenaga, S., Bannai, H., & Kida, T. (2019). MR-RePair: Grammar Compression Based on Maximal Repeats. : J. Serra-Sagrista, A. Bilgin, M. W. Marcellin, & J. A. Storer (版), Proceedings - DCC 2019: 2019 Data Compression Conference (pp. 508-517). [8712661] (Data Compression Conference Proceedings; 巻数 2019-March). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DCC.2019.00059

MR-RePair : Grammar Compression Based on Maximal Repeats. / Furuya, Isamu; Takagi, Takuya; Nakashima, Yuto; Inenaga, Shunsuke; Bannai, Hideo; Kida, Takuya.

Proceedings - DCC 2019: 2019 Data Compression Conference. 版 / Joan Serra-Sagrista; Ali Bilgin; Michael W. Marcellin; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2019. p. 508-517 8712661 (Data Compression Conference Proceedings; 巻 2019-March).

研究成果: 著書/レポートタイプへの貢献会議での発言

Furuya, I, Takagi, T, Nakashima, Y, Inenaga, S, Bannai, H & Kida, T 2019, MR-RePair: Grammar Compression Based on Maximal Repeats. : J Serra-Sagrista, A Bilgin, MW Marcellin & JA Storer (版), Proceedings - DCC 2019: 2019 Data Compression Conference., 8712661, Data Compression Conference Proceedings, 巻. 2019-March, Institute of Electrical and Electronics Engineers Inc., pp. 508-517, 2019 Data Compression Conference, DCC 2019, Snowbird, 米国, 3/26/19. https://doi.org/10.1109/DCC.2019.00059
Furuya I, Takagi T, Nakashima Y, Inenaga S, Bannai H, Kida T. MR-RePair: Grammar Compression Based on Maximal Repeats. : Serra-Sagrista J, Bilgin A, Marcellin MW, Storer JA, 編集者, Proceedings - DCC 2019: 2019 Data Compression Conference. Institute of Electrical and Electronics Engineers Inc. 2019. p. 508-517. 8712661. (Data Compression Conference Proceedings). https://doi.org/10.1109/DCC.2019.00059
Furuya, Isamu ; Takagi, Takuya ; Nakashima, Yuto ; Inenaga, Shunsuke ; Bannai, Hideo ; Kida, Takuya. / MR-RePair : Grammar Compression Based on Maximal Repeats. Proceedings - DCC 2019: 2019 Data Compression Conference. 編集者 / Joan Serra-Sagrista ; Ali Bilgin ; Michael W. Marcellin ; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 508-517 (Data Compression Conference Proceedings).
@inproceedings{21b5bef1a8b44ff3ba4d1eea07712f48,
title = "MR-RePair: Grammar Compression Based on Maximal Repeats",
abstract = "We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.",
author = "Isamu Furuya and Takuya Takagi and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Takuya Kida",
year = "2019",
month = "5",
day = "10",
doi = "10.1109/DCC.2019.00059",
language = "English",
series = "Data Compression Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "508--517",
editor = "Joan Serra-Sagrista and Ali Bilgin and Marcellin, {Michael W.} and Storer, {James A.}",
booktitle = "Proceedings - DCC 2019",
address = "United States",

}

TY - GEN

T1 - MR-RePair

T2 - Grammar Compression Based on Maximal Repeats

AU - Furuya, Isamu

AU - Takagi, Takuya

AU - Nakashima, Yuto

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Kida, Takuya

PY - 2019/5/10

Y1 - 2019/5/10

N2 - We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

AB - We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

UR - http://www.scopus.com/inward/record.url?scp=85066340305&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066340305&partnerID=8YFLogxK

U2 - 10.1109/DCC.2019.00059

DO - 10.1109/DCC.2019.00059

M3 - Conference contribution

AN - SCOPUS:85066340305

T3 - Data Compression Conference Proceedings

SP - 508

EP - 517

BT - Proceedings - DCC 2019

A2 - Serra-Sagrista, Joan

A2 - Bilgin, Ali

A2 - Marcellin, Michael W.

A2 - Storer, James A.

PB - Institute of Electrical and Electronics Engineers Inc.

ER -