TY - GEN

T1 - Online Algorithms for Finding Distinct Substrings with Length and Multiple Prefix and Suffix Conditions

AU - Leonard, Laurentius

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Mieno, Takuya

N1 - Funding Information:
Acknowledgements. This work was supported by JSPS KAKENHI Grant Numbers JP20H04141 (HB) and JP22H03551 (SI), and by JST PRESTO Grant Number JPMJPR1922 (SI).
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2022

Y1 - 2022

N2 - Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers k1 and k2 be given, as well as a string T given in an online manner, such that Ti represents the length-i prefix of T for 1 ≤ i≤ | T|. In this paper we are interested in computing the set ansi of distinct substrings w of Ti such that k1≤ | w| ≤ k2, and w contains some p∈ P as a prefix and some s∈ S as a suffix. More specifically, the counting problem is to output | ansi|, whereas the reporting problem is to output all elements of ansi, for each iteration i. Let σ denote the alphabet size, and for a sequence of strings A, ‖ A‖ = ∑ u∈A| u|. Then, we show that after O((‖ P‖ + ‖ S‖ ) log σ) -time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in O(| Ti| log σ) and O(| Ti| log σ+ | ansi| ) total time. The preprocessing time can be reduced to O(‖ P‖ + ‖ S‖ ) for integer alphabets of size polynomial with regard to ‖ P‖ + ‖ S‖. Our algorithms have possible applications to network traffic classification.

AB - Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers k1 and k2 be given, as well as a string T given in an online manner, such that Ti represents the length-i prefix of T for 1 ≤ i≤ | T|. In this paper we are interested in computing the set ansi of distinct substrings w of Ti such that k1≤ | w| ≤ k2, and w contains some p∈ P as a prefix and some s∈ S as a suffix. More specifically, the counting problem is to output | ansi|, whereas the reporting problem is to output all elements of ansi, for each iteration i. Let σ denote the alphabet size, and for a sequence of strings A, ‖ A‖ = ∑ u∈A| u|. Then, we show that after O((‖ P‖ + ‖ S‖ ) log σ) -time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in O(| Ti| log σ) and O(| Ti| log σ+ | ansi| ) total time. The preprocessing time can be reduced to O(‖ P‖ + ‖ S‖ ) for integer alphabets of size polynomial with regard to ‖ P‖ + ‖ S‖. Our algorithms have possible applications to network traffic classification.

UR - http://www.scopus.com/inward/record.url?scp=85142767950&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85142767950&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-20643-6_3

DO - 10.1007/978-3-031-20643-6_3

M3 - Conference contribution

AN - SCOPUS:85142767950

SN - 9783031206429

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 24

EP - 37

BT - String Processing and Information Retrieval - 29th International Symposium, SPIRE 2022, Proceedings

A2 - Arroyuelo, Diego

A2 - Arroyuelo, Diego

A2 - Poblete, Barbara

PB - Springer Science and Business Media Deutschland GmbH

T2 - 29th International Symposium on String Processing and Information Retrieval, SPIRE 2022

Y2 - 8 November 2022 through 10 November 2022

ER -