### Abstract

We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n).

Original language | English |
---|---|

Title of host publication | Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Proceedings |

Pages | 220-231 |

Number of pages | 12 |

DOIs | |

Publication status | Published - 2012 |

Event | 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012 - Helsinki, Finland Duration: Jul 3 2012 → Jul 5 2012 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 7354 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Other

Other | 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012 |
---|---|

Country | Finland |

City | Helsinki |

Period | 7/3/12 → 7/5/12 |

### All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Computer Science(all)

## Fingerprint Dive into the research topics of 'Speeding up q-gram mining on grammar-based compressed texts'. Together they form a unique fingerprint.

## Cite this

*Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Proceedings*(pp. 220-231). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7354 LNCS). https://doi.org/10.1007/978-3-642-31265-6_18