TY - GEN
T1 - Fast q-gram mining on SLP compressed strings
AU - Goto, Keisuke
AU - Bannai, Hideo
AU - Inenaga, Shunsuke
AU - Takeda, Masayuki
PY - 2011
Y1 - 2011
N2 - We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.
AB - We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.
UR - http://www.scopus.com/inward/record.url?scp=80053998246&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053998246&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-24583-1_27
DO - 10.1007/978-3-642-24583-1_27
M3 - Conference contribution
AN - SCOPUS:80053998246
SN - 9783642245824
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 278
EP - 289
BT - String Processing and Information Retrieval - 18th International Symposium, SPIRE 2011, Proceedings
T2 - 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011
Y2 - 17 October 2011 through 21 October 2011
ER -