TY - GEN

T1 - Efficient LZ78 factorization of grammar compressed text

AU - Bannai, Hideo

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2012/10/22

Y1 - 2012/10/22

N2 - We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in time and space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N∈-∈α) where α∈≥∈0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m∈=∈O(N/log σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.

AB - We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in time and space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N∈-∈α) where α∈≥∈0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m∈=∈O(N/log σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.

UR - http://www.scopus.com/inward/record.url?scp=84867496904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867496904&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-34109-0-10

DO - 10.1007/978-3-642-34109-0-10

M3 - Conference contribution

AN - SCOPUS:84867496904

SN - 9783642341083

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 86

EP - 98

BT - String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings

T2 - 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012

Y2 - 20 October 2012 through 24 October 2012

ER -