### Abstract

We propose a new variant of the LZ78 factorization which we call the LZ Double-factor factorization (LZD factorization). Each factor of the LZD factorization of a string is the concatenation of the two longest previous factors, while each factor of the LZ78 factorization is that of the longest previous factor and the following character. Interestingly, this simple modification drastically improves the compression ratio in practice. We propose two online algorithms to compute the LZD factorization in O(m(M +min(m, M) log σ)) time and O(m) space, or in O(N log σ) time and O(N) space, where m is the number of factors to output, M is the length of the longest factor(s), N is the length of the input string, and σ is the alphabet size. We also show two versions of our LZD factorization with variable-to-fixed encoding, and present online algorithms to compute these versions in O(N + min(m, 2^{L})(M + min(m, M, 2^{L}) log σ)) time and O(min(2^{L}, m)) space, where L is the bit-length of each fixed-length code word. The LZD factorization and its versions with variable-to fixed encoding are actually grammar-based compression, and our experiments show that our algorithms outperform the state-of-the-art online grammar-based compression algorithms on several data sets.

Original language | English |
---|---|

Title of host publication | Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings |

Editors | Ugo Vaccaro, Ely Porat, Ferdinando Cicalese |

Publisher | Springer Verlag |

Pages | 219-230 |

Number of pages | 12 |

ISBN (Print) | 9783319199283 |

DOIs | |

Publication status | Published - 2015 |

Event | 26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015 - Ischia Island, Italy Duration: Jun 29 2015 → Jul 1 2015 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 9133 |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Other

Other | 26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015 |
---|---|

Country | Italy |

City | Ischia Island |

Period | 6/29/15 → 7/1/15 |

### All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Computer Science(all)

## Fingerprint Dive into the research topics of 'LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding'. Together they form a unique fingerprint.

## Cite this

*Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings*(pp. 219-230). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9133). Springer Verlag. https://doi.org/10.1007/978-3-319-19929-0_19