Fast q-gram mining on SLP compressed strings

研究成果: ジャーナルへの寄稿記事

11 引用 (Scopus)

抄録

We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.

元の言語英語
ページ(範囲)89-99
ページ数11
ジャーナルJournal of Discrete Algorithms
18
DOI
出版物ステータス出版済み - 1 1 2013

Fingerprint

Straight-line Programs
Mining
Strings
Data Classification
Computational Experiments
Data Mining
Efficient Algorithms
Data mining
Experiments

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Discrete Mathematics and Combinatorics
  • Computational Theory and Mathematics

これを引用

Fast q-gram mining on SLP compressed strings. / Goto, Keisuke; Bannai, Hideo; Inenaga, Shunsuke; Takeda, Masayuki.

:: Journal of Discrete Algorithms, 巻 18, 01.01.2013, p. 89-99.

研究成果: ジャーナルへの寄稿記事

@article{27a39b7825bb4ff3928797305677aedc,
title = "Fast q-gram mining on SLP compressed strings",
abstract = "We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.",
author = "Keisuke Goto and Hideo Bannai and Shunsuke Inenaga and Masayuki Takeda",
year = "2013",
month = "1",
day = "1",
doi = "10.1016/j.jda.2012.07.006",
language = "English",
volume = "18",
pages = "89--99",
journal = "Journal of Discrete Algorithms",
issn = "1570-8667",
publisher = "Elsevier",

}

TY - JOUR

T1 - Fast q-gram mining on SLP compressed strings

AU - Goto, Keisuke

AU - Bannai, Hideo

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2013/1/1

Y1 - 2013/1/1

N2 - We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.

AB - We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.

UR - http://www.scopus.com/inward/record.url?scp=84872118069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872118069&partnerID=8YFLogxK

U2 - 10.1016/j.jda.2012.07.006

DO - 10.1016/j.jda.2012.07.006

M3 - Article

VL - 18

SP - 89

EP - 99

JO - Journal of Discrete Algorithms

JF - Journal of Discrete Algorithms

SN - 1570-8667

ER -