TY - JOUR

T1 - Space-efficient algorithms for computing minimal/shortest unique substrings

AU - Mieno, Takuya

AU - Köppl, Dominik

AU - Nakashima, Yuto

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Takeda, Masayuki

N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers JP18F18120 (KD), JP18K18002 (YN), JP17H01697 (SI), JP16H02783 (HB), JP18H04098 (MT), and by JST PRESTO Grant Number JPMJPR1922 (SI).
Publisher Copyright:
© 2020 Elsevier B.V.

PY - 2020

Y1 - 2020

N2 - Given a string T of length n, a substring u=T[i..j] of T is called a shortest unique substring (SUS) for an interval [s,t] if (a) u occurs exactly once in T, (b) u contains the interval [s,t] (i.e. i≤s≤t≤j), and (c) every substring v of T with |v|<|u| containing [s,t] occurs at least twice in T. Given a query interval [s,t]⊂[1,n], the interval SUS problem is to output all the SUSs for the interval [s,t]. In this article, we propose a 4n+o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s=t. Here, we propose a ⌈(log23+1)n⌉+o(n) bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of T.

AB - Given a string T of length n, a substring u=T[i..j] of T is called a shortest unique substring (SUS) for an interval [s,t] if (a) u occurs exactly once in T, (b) u contains the interval [s,t] (i.e. i≤s≤t≤j), and (c) every substring v of T with |v|<|u| containing [s,t] occurs at least twice in T. Given a query interval [s,t]⊂[1,n], the interval SUS problem is to output all the SUSs for the interval [s,t]. In this article, we propose a 4n+o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s=t. Here, we propose a ⌈(log23+1)n⌉+o(n) bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of T.

UR - http://www.scopus.com/inward/record.url?scp=85091069182&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85091069182&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2020.09.017

DO - 10.1016/j.tcs.2020.09.017

M3 - Article

AN - SCOPUS:85091069182

VL - 845

SP - 230

EP - 242

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

ER -