TY - GEN
T1 - Online Algorithms for Finding Distinct Substrings with Length and Multiple Prefix and Suffix Conditions
AU - Leonard, Laurentius
AU - Inenaga, Shunsuke
AU - Bannai, Hideo
AU - Mieno, Takuya
N1 - Funding Information:
Acknowledgements. This work was supported by JSPS KAKENHI Grant Numbers JP20H04141 (HB) and JP22H03551 (SI), and by JST PRESTO Grant Number JPMJPR1922 (SI).
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers k1 and k2 be given, as well as a string T given in an online manner, such that Ti represents the length-i prefix of T for 1 ≤ i≤ | T|. In this paper we are interested in computing the set ansi of distinct substrings w of Ti such that k1≤ | w| ≤ k2, and w contains some p∈ P as a prefix and some s∈ S as a suffix. More specifically, the counting problem is to output | ansi|, whereas the reporting problem is to output all elements of ansi, for each iteration i. Let σ denote the alphabet size, and for a sequence of strings A, ‖ A‖ = ∑ u∈A| u|. Then, we show that after O((‖ P‖ + ‖ S‖ ) log σ) -time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in O(| Ti| log σ) and O(| Ti| log σ+ | ansi| ) total time. The preprocessing time can be reduced to O(‖ P‖ + ‖ S‖ ) for integer alphabets of size polynomial with regard to ‖ P‖ + ‖ S‖. Our algorithms have possible applications to network traffic classification.
AB - Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers k1 and k2 be given, as well as a string T given in an online manner, such that Ti represents the length-i prefix of T for 1 ≤ i≤ | T|. In this paper we are interested in computing the set ansi of distinct substrings w of Ti such that k1≤ | w| ≤ k2, and w contains some p∈ P as a prefix and some s∈ S as a suffix. More specifically, the counting problem is to output | ansi|, whereas the reporting problem is to output all elements of ansi, for each iteration i. Let σ denote the alphabet size, and for a sequence of strings A, ‖ A‖ = ∑ u∈A| u|. Then, we show that after O((‖ P‖ + ‖ S‖ ) log σ) -time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in O(| Ti| log σ) and O(| Ti| log σ+ | ansi| ) total time. The preprocessing time can be reduced to O(‖ P‖ + ‖ S‖ ) for integer alphabets of size polynomial with regard to ‖ P‖ + ‖ S‖. Our algorithms have possible applications to network traffic classification.
UR - http://www.scopus.com/inward/record.url?scp=85142767950&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142767950&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-20643-6_3
DO - 10.1007/978-3-031-20643-6_3
M3 - Conference contribution
AN - SCOPUS:85142767950
SN - 9783031206429
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 24
EP - 37
BT - String Processing and Information Retrieval - 29th International Symposium, SPIRE 2022, Proceedings
A2 - Arroyuelo, Diego
A2 - Arroyuelo, Diego
A2 - Poblete, Barbara
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Symposium on String Processing and Information Retrieval, SPIRE 2022
Y2 - 8 November 2022 through 10 November 2022
ER -