### Abstract

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in time and space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N∈-∈α) where α∈≥∈0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m∈=∈O(N/log _{σ} N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.

Original language | English |
---|---|

Title of host publication | String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings |

Pages | 86-98 |

Number of pages | 13 |

DOIs | |

Publication status | Published - Oct 22 2012 |

Event | 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012 - Cartagena de Indias, Colombia Duration: Oct 21 2012 → Oct 25 2012 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 7608 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Other

Other | 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012 |
---|---|

Country | Colombia |

City | Cartagena de Indias |

Period | 10/21/12 → 10/25/12 |

### All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Computer Science(all)

