TY - GEN

T1 - Speeding up q-gram mining on grammar-based compressed texts

AU - Goto, Keisuke

AU - Bannai, Hideo

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.

PY - 2012

Y1 - 2012

N2 - We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n).

AB - We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n).

UR - http://www.scopus.com/inward/record.url?scp=84863090446&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863090446&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-31265-6_18

DO - 10.1007/978-3-642-31265-6_18

M3 - Conference contribution

AN - SCOPUS:84863090446

SN - 9783642312649

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 220

EP - 231

BT - Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Proceedings

T2 - 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012

Y2 - 3 July 2012 through 5 July 2012

ER -