Tight bounds on the maximum number of shortest unique substrings

研究成果: 著書/レポートタイプへの貢献会議での発言

抄録

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

元の言語英語
ホスト出版物のタイトル28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
出版者Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
78
ISBN(電子版)9783959770392
DOI
出版物ステータス出版済み - 7 1 2017
イベント28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 - Warsaw, ポーランド
継続期間: 7 4 20177 6 2017

その他

その他28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
ポーランド
Warsaw
期間7/4/177/6/17

All Science Journal Classification (ASJC) codes

  • Software

これを引用

Mieno, T., Inenaga, S., Bannai, H., & Takeda, M. (2017). Tight bounds on the maximum number of shortest unique substrings. : 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 (巻 78). [24] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.CPM.2017.24

Tight bounds on the maximum number of shortest unique substrings. / Mieno, Takuya; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki.

28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. 巻 78 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017. 24.

研究成果: 著書/レポートタイプへの貢献会議での発言

Mieno, T, Inenaga, S, Bannai, H & Takeda, M 2017, Tight bounds on the maximum number of shortest unique substrings. : 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. 巻. 78, 24, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, Warsaw, ポーランド, 7/4/17. https://doi.org/10.4230/LIPIcs.CPM.2017.24
Mieno T, Inenaga S, Bannai H, Takeda M. Tight bounds on the maximum number of shortest unique substrings. : 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. 巻 78. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2017. 24 https://doi.org/10.4230/LIPIcs.CPM.2017.24
Mieno, Takuya ; Inenaga, Shunsuke ; Bannai, Hideo ; Takeda, Masayuki. / Tight bounds on the maximum number of shortest unique substrings. 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017. 巻 78 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017.
@inproceedings{f47232554911436e8d0f048a7aed81c2,
title = "Tight bounds on the maximum number of shortest unique substrings",
abstract = "A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.",
author = "Takuya Mieno and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda",
year = "2017",
month = "7",
day = "1",
doi = "10.4230/LIPIcs.CPM.2017.24",
language = "English",
volume = "78",
booktitle = "28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",

}

TY - GEN

T1 - Tight bounds on the maximum number of shortest unique substrings

AU - Mieno, Takuya

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Takeda, Masayuki

PY - 2017/7/1

Y1 - 2017/7/1

N2 - A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

AB - A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

UR - http://www.scopus.com/inward/record.url?scp=85027271339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027271339&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.CPM.2017.24

DO - 10.4230/LIPIcs.CPM.2017.24

M3 - Conference contribution

VL - 78

BT - 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -