LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

We propose a new variant of the LZ78 factorization which we call the LZ Double-factor factorization (LZD factorization). Each factor of the LZD factorization of a string is the concatenation of the two longest previous factors, while each factor of the LZ78 factorization is that of the longest previous factor and the following character. Interestingly, this simple modification drastically improves the compression ratio in practice. We propose two online algorithms to compute the LZD factorization in O(m(M +min(m, M) log σ)) time and O(m) space, or in O(N log σ) time and O(N) space, where m is the number of factors to output, M is the length of the longest factor(s), N is the length of the input string, and σ is the alphabet size. We also show two versions of our LZD factorization with variable-to-fixed encoding, and present online algorithms to compute these versions in O(N + min(m, 2L)(M + min(m, M, 2L) log σ)) time and O(min(2L, m)) space, where L is the bit-length of each fixed-length code word. The LZD factorization and its versions with variable-to fixed encoding are actually grammar-based compression, and our experiments show that our algorithms outperform the state-of-the-art online grammar-based compression algorithms on several data sets.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings
EditorsUgo Vaccaro, Ely Porat, Ferdinando Cicalese
PublisherSpringer Verlag
Pages219-230
Number of pages12
ISBN (Print)9783319199283
DOIs
Publication statusPublished - Jan 1 2015
Event26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015 - Ischia Island, Italy
Duration: Jun 29 2015Jul 1 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9133
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015
CountryItaly
CityIschia Island
Period6/29/157/1/15

Fingerprint

Factorization
Grammar
Encoding
Compression
Online Algorithms
Strings
Concatenation
Experiments
Output

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Goto, K., Bannai, H., Inenaga, S., & Takeda, M. (2015). LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding. In U. Vaccaro, E. Porat, & F. Cicalese (Eds.), Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings (pp. 219-230). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9133). Springer Verlag. https://doi.org/10.1007/978-3-319-19929-0_19

LZD factorization : Simple and practical online grammar compression with variable-to-fixed encoding. / Goto, Keisuke; Bannai, Hideo; Inenaga, Shunsuke; Takeda, Masayuki.

Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings. ed. / Ugo Vaccaro; Ely Porat; Ferdinando Cicalese. Springer Verlag, 2015. p. 219-230 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9133).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Goto, K, Bannai, H, Inenaga, S & Takeda, M 2015, LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding. in U Vaccaro, E Porat & F Cicalese (eds), Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9133, Springer Verlag, pp. 219-230, 26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015, Ischia Island, Italy, 6/29/15. https://doi.org/10.1007/978-3-319-19929-0_19
Goto K, Bannai H, Inenaga S, Takeda M. LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding. In Vaccaro U, Porat E, Cicalese F, editors, Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings. Springer Verlag. 2015. p. 219-230. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-19929-0_19
Goto, Keisuke ; Bannai, Hideo ; Inenaga, Shunsuke ; Takeda, Masayuki. / LZD factorization : Simple and practical online grammar compression with variable-to-fixed encoding. Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings. editor / Ugo Vaccaro ; Ely Porat ; Ferdinando Cicalese. Springer Verlag, 2015. pp. 219-230 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5373e6b069bf47eb81ce1040bd4c7ef6,
title = "LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding",
abstract = "We propose a new variant of the LZ78 factorization which we call the LZ Double-factor factorization (LZD factorization). Each factor of the LZD factorization of a string is the concatenation of the two longest previous factors, while each factor of the LZ78 factorization is that of the longest previous factor and the following character. Interestingly, this simple modification drastically improves the compression ratio in practice. We propose two online algorithms to compute the LZD factorization in O(m(M +min(m, M) log σ)) time and O(m) space, or in O(N log σ) time and O(N) space, where m is the number of factors to output, M is the length of the longest factor(s), N is the length of the input string, and σ is the alphabet size. We also show two versions of our LZD factorization with variable-to-fixed encoding, and present online algorithms to compute these versions in O(N + min(m, 2L)(M + min(m, M, 2L) log σ)) time and O(min(2L, m)) space, where L is the bit-length of each fixed-length code word. The LZD factorization and its versions with variable-to fixed encoding are actually grammar-based compression, and our experiments show that our algorithms outperform the state-of-the-art online grammar-based compression algorithms on several data sets.",
author = "Keisuke Goto and Hideo Bannai and Shunsuke Inenaga and Masayuki Takeda",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/978-3-319-19929-0_19",
language = "English",
isbn = "9783319199283",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "219--230",
editor = "Ugo Vaccaro and Ely Porat and Ferdinando Cicalese",
booktitle = "Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings",
address = "Germany",

}

TY - GEN

T1 - LZD factorization

T2 - Simple and practical online grammar compression with variable-to-fixed encoding

AU - Goto, Keisuke

AU - Bannai, Hideo

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We propose a new variant of the LZ78 factorization which we call the LZ Double-factor factorization (LZD factorization). Each factor of the LZD factorization of a string is the concatenation of the two longest previous factors, while each factor of the LZ78 factorization is that of the longest previous factor and the following character. Interestingly, this simple modification drastically improves the compression ratio in practice. We propose two online algorithms to compute the LZD factorization in O(m(M +min(m, M) log σ)) time and O(m) space, or in O(N log σ) time and O(N) space, where m is the number of factors to output, M is the length of the longest factor(s), N is the length of the input string, and σ is the alphabet size. We also show two versions of our LZD factorization with variable-to-fixed encoding, and present online algorithms to compute these versions in O(N + min(m, 2L)(M + min(m, M, 2L) log σ)) time and O(min(2L, m)) space, where L is the bit-length of each fixed-length code word. The LZD factorization and its versions with variable-to fixed encoding are actually grammar-based compression, and our experiments show that our algorithms outperform the state-of-the-art online grammar-based compression algorithms on several data sets.

AB - We propose a new variant of the LZ78 factorization which we call the LZ Double-factor factorization (LZD factorization). Each factor of the LZD factorization of a string is the concatenation of the two longest previous factors, while each factor of the LZ78 factorization is that of the longest previous factor and the following character. Interestingly, this simple modification drastically improves the compression ratio in practice. We propose two online algorithms to compute the LZD factorization in O(m(M +min(m, M) log σ)) time and O(m) space, or in O(N log σ) time and O(N) space, where m is the number of factors to output, M is the length of the longest factor(s), N is the length of the input string, and σ is the alphabet size. We also show two versions of our LZD factorization with variable-to-fixed encoding, and present online algorithms to compute these versions in O(N + min(m, 2L)(M + min(m, M, 2L) log σ)) time and O(min(2L, m)) space, where L is the bit-length of each fixed-length code word. The LZD factorization and its versions with variable-to fixed encoding are actually grammar-based compression, and our experiments show that our algorithms outperform the state-of-the-art online grammar-based compression algorithms on several data sets.

UR - http://www.scopus.com/inward/record.url?scp=84949036063&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949036063&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-19929-0_19

DO - 10.1007/978-3-319-19929-0_19

M3 - Conference contribution

AN - SCOPUS:84949036063

SN - 9783319199283

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 219

EP - 230

BT - Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings

A2 - Vaccaro, Ugo

A2 - Porat, Ely

A2 - Cicalese, Ferdinando

PB - Springer Verlag

ER -