TY - GEN
T1 - Speeding up q-gram mining on grammar-based compressed texts
AU - Goto, Keisuke
AU - Bannai, Hideo
AU - Inenaga, Shunsuke
AU - Takeda, Masayuki
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n).
AB - We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n).
UR - http://www.scopus.com/inward/record.url?scp=84863090446&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863090446&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-31265-6_18
DO - 10.1007/978-3-642-31265-6_18
M3 - Conference contribution
AN - SCOPUS:84863090446
SN - 9783642312649
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 220
EP - 231
BT - Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Proceedings
T2 - 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012
Y2 - 3 July 2012 through 5 July 2012
ER -