Faster subsequence and don't-care pattern matching on compressed texts

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings
Pages309-322
Number of pages14
DOIs
Publication statusPublished - Jul 13 2011
Event22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011 - Palermo, Italy
Duration: Jun 27 2011Jun 29 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6661 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011
CountryItaly
CityPalermo
Period6/27/116/29/11

Fingerprint

Pattern matching
Pattern Matching
Subsequence
Straight-line Programs
Strings
Matching Problem
Matching Algorithm
Time Complexity
Text
Range of data

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Yamamoto, T., Bannai, H., Inenaga, S., & Takeda, M. (2011). Faster subsequence and don't-care pattern matching on compressed texts. In Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings (pp. 309-322). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6661 LNCS). https://doi.org/10.1007/978-3-642-21458-5_27

Faster subsequence and don't-care pattern matching on compressed texts. / Yamamoto, Takanori; Bannai, Hideo; Inenaga, Shunsuke; Takeda, Masayuki.

Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings. 2011. p. 309-322 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6661 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, T, Bannai, H, Inenaga, S & Takeda, M 2011, Faster subsequence and don't-care pattern matching on compressed texts. in Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6661 LNCS, pp. 309-322, 22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011, Palermo, Italy, 6/27/11. https://doi.org/10.1007/978-3-642-21458-5_27
Yamamoto T, Bannai H, Inenaga S, Takeda M. Faster subsequence and don't-care pattern matching on compressed texts. In Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings. 2011. p. 309-322. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-21458-5_27
Yamamoto, Takanori ; Bannai, Hideo ; Inenaga, Shunsuke ; Takeda, Masayuki. / Faster subsequence and don't-care pattern matching on compressed texts. Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings. 2011. pp. 309-322 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{117875b2c12c4228a9582ac670776873,
title = "Faster subsequence and don't-care pattern matching on compressed texts",
abstract = "Subsequence pattern matching problems on compressed text were first considered by C{\'e}gielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by C{\'e}gielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.",
author = "Takanori Yamamoto and Hideo Bannai and Shunsuke Inenaga and Masayuki Takeda",
year = "2011",
month = "7",
day = "13",
doi = "10.1007/978-3-642-21458-5_27",
language = "English",
isbn = "9783642214578",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "309--322",
booktitle = "Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings",

}

TY - GEN

T1 - Faster subsequence and don't-care pattern matching on compressed texts

AU - Yamamoto, Takanori

AU - Bannai, Hideo

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2011/7/13

Y1 - 2011/7/13

N2 - Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.

AB - Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.

UR - http://www.scopus.com/inward/record.url?scp=79960081284&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960081284&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-21458-5_27

DO - 10.1007/978-3-642-21458-5_27

M3 - Conference contribution

AN - SCOPUS:79960081284

SN - 9783642214578

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 309

EP - 322

BT - Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings

ER -