Speeding up q-gram mining on grammar-based compressed texts

Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n).

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Proceedings
Pages220-231
Number of pages12
DOIs
Publication statusPublished - 2012
Event23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012 - Helsinki, Finland
Duration: Jul 3 2012Jul 5 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7354 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012
Country/TerritoryFinland
CityHelsinki
Period7/3/127/5/12

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Speeding up q-gram mining on grammar-based compressed texts'. Together they form a unique fingerprint.

Cite this