TY - JOUR
T1 - Space-efficient algorithms for computing minimal/shortest unique substrings
AU - Mieno, Takuya
AU - Köppl, Dominik
AU - Nakashima, Yuto
AU - Inenaga, Shunsuke
AU - Bannai, Hideo
AU - Takeda, Masayuki
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers JP18F18120 (KD), JP18K18002 (YN), JP17H01697 (SI), JP16H02783 (HB), JP18H04098 (MT), and by JST PRESTO Grant Number JPMJPR1922 (SI).
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020
Y1 - 2020
N2 - Given a string T of length n, a substring u=T[i..j] of T is called a shortest unique substring (SUS) for an interval [s,t] if (a) u occurs exactly once in T, (b) u contains the interval [s,t] (i.e. i≤s≤t≤j), and (c) every substring v of T with |v|<|u| containing [s,t] occurs at least twice in T. Given a query interval [s,t]⊂[1,n], the interval SUS problem is to output all the SUSs for the interval [s,t]. In this article, we propose a 4n+o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s=t. Here, we propose a ⌈(log23+1)n⌉+o(n) bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of T.
AB - Given a string T of length n, a substring u=T[i..j] of T is called a shortest unique substring (SUS) for an interval [s,t] if (a) u occurs exactly once in T, (b) u contains the interval [s,t] (i.e. i≤s≤t≤j), and (c) every substring v of T with |v|<|u| containing [s,t] occurs at least twice in T. Given a query interval [s,t]⊂[1,n], the interval SUS problem is to output all the SUSs for the interval [s,t]. In this article, we propose a 4n+o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s=t. Here, we propose a ⌈(log23+1)n⌉+o(n) bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of T.
UR - http://www.scopus.com/inward/record.url?scp=85091069182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091069182&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2020.09.017
DO - 10.1016/j.tcs.2020.09.017
M3 - Article
AN - SCOPUS:85091069182
SN - 0304-3975
VL - 845
SP - 230
EP - 242
JO - Theoretical Computer Science
JF - Theoretical Computer Science
ER -