Tight bounds on the maximum number of shortest unique substrings

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

Original languageEnglish
Title of host publication28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Volume78
ISBN (Electronic)9783959770392
DOIs
Publication statusPublished - Jul 1 2017
Event28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 - Warsaw, Poland
Duration: Jul 4 2017Jul 6 2017

Other

Other28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
CountryPoland
CityWarsaw
Period7/4/177/6/17

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Mieno, T., Inenaga, S., Bannai, H., & Takeda, M. (2017). Tight bounds on the maximum number of shortest unique substrings. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 (Vol. 78). [24] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.CPM.2017.24

Tight bounds on the maximum number of shortest unique substrings. / Mieno, Takuya; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki.

28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. Vol. 78 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017. 24.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mieno, T, Inenaga, S, Bannai, H & Takeda, M 2017, Tight bounds on the maximum number of shortest unique substrings. in 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. vol. 78, 24, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, Warsaw, Poland, 7/4/17. https://doi.org/10.4230/LIPIcs.CPM.2017.24
Mieno T, Inenaga S, Bannai H, Takeda M. Tight bounds on the maximum number of shortest unique substrings. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. Vol. 78. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2017. 24 https://doi.org/10.4230/LIPIcs.CPM.2017.24
Mieno, Takuya ; Inenaga, Shunsuke ; Bannai, Hideo ; Takeda, Masayuki. / Tight bounds on the maximum number of shortest unique substrings. 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. Vol. 78 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017.
@inproceedings{f47232554911436e8d0f048a7aed81c2,
title = "Tight bounds on the maximum number of shortest unique substrings",
abstract = "A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.",
author = "Takuya Mieno and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda",
year = "2017",
month = "7",
day = "1",
doi = "10.4230/LIPIcs.CPM.2017.24",
language = "English",
volume = "78",
booktitle = "28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",

}

TY - GEN

T1 - Tight bounds on the maximum number of shortest unique substrings

AU - Mieno, Takuya

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Takeda, Masayuki

PY - 2017/7/1

Y1 - 2017/7/1

N2 - A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

AB - A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

UR - http://www.scopus.com/inward/record.url?scp=85027271339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027271339&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.CPM.2017.24

DO - 10.4230/LIPIcs.CPM.2017.24

M3 - Conference contribution

AN - SCOPUS:85027271339

VL - 78

BT - 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -