An online algorithm for lightweight grammar-based compression

Shirou Maruyama, Masayuki Takeda, Masaya Nakahara, Hiroshi Sakamoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees O(log2n)-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.

Original languageEnglish
Title of host publicationProceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011
Pages19-28
Number of pages10
DOIs
Publication statusPublished - Nov 21 2011
Event1st International Conference on Data Compression, Communication, and Processing, CCP 2011 - Palinuro, Cilento Coast, Italy
Duration: Jun 21 2011Jun 24 2011

Publication series

NameProceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011

Other

Other1st International Conference on Data Compression, Communication, and Processing, CCP 2011
CountryItaly
CityPalinuro, Cilento Coast
Period6/21/116/24/11

Fingerprint

Context free grammars
Binary trees
Compressors
Data storage equipment

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems

Cite this

Maruyama, S., Takeda, M., Nakahara, M., & Sakamoto, H. (2011). An online algorithm for lightweight grammar-based compression. In Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011 (pp. 19-28). [6061023] (Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011). https://doi.org/10.1109/CCP.2011.40

An online algorithm for lightweight grammar-based compression. / Maruyama, Shirou; Takeda, Masayuki; Nakahara, Masaya; Sakamoto, Hiroshi.

Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011. 2011. p. 19-28 6061023 (Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Maruyama, S, Takeda, M, Nakahara, M & Sakamoto, H 2011, An online algorithm for lightweight grammar-based compression. in Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011., 6061023, Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011, pp. 19-28, 1st International Conference on Data Compression, Communication, and Processing, CCP 2011, Palinuro, Cilento Coast, Italy, 6/21/11. https://doi.org/10.1109/CCP.2011.40
Maruyama S, Takeda M, Nakahara M, Sakamoto H. An online algorithm for lightweight grammar-based compression. In Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011. 2011. p. 19-28. 6061023. (Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011). https://doi.org/10.1109/CCP.2011.40
Maruyama, Shirou ; Takeda, Masayuki ; Nakahara, Masaya ; Sakamoto, Hiroshi. / An online algorithm for lightweight grammar-based compression. Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011. 2011. pp. 19-28 (Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011).
@inproceedings{1be51445b8744aad9b27e95261cd050e,
title = "An online algorithm for lightweight grammar-based compression",
abstract = "Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees O(log2n)-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.",
author = "Shirou Maruyama and Masayuki Takeda and Masaya Nakahara and Hiroshi Sakamoto",
year = "2011",
month = "11",
day = "21",
doi = "10.1109/CCP.2011.40",
language = "English",
isbn = "9780769545288",
series = "Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011",
pages = "19--28",
booktitle = "Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011",

}

TY - GEN

T1 - An online algorithm for lightweight grammar-based compression

AU - Maruyama, Shirou

AU - Takeda, Masayuki

AU - Nakahara, Masaya

AU - Sakamoto, Hiroshi

PY - 2011/11/21

Y1 - 2011/11/21

N2 - Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees O(log2n)-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.

AB - Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees O(log2n)-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.

UR - http://www.scopus.com/inward/record.url?scp=81255164860&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=81255164860&partnerID=8YFLogxK

U2 - 10.1109/CCP.2011.40

DO - 10.1109/CCP.2011.40

M3 - Conference contribution

SN - 9780769545288

T3 - Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011

SP - 19

EP - 28

BT - Proceedings - 1st International Conference on Data Compression, Communication, and Processing, CCP 2011

ER -