A new characterization of maximal repetitions by Lyndon trees

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.

Original languageEnglish
Title of host publicationProceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015
PublisherAssociation for Computing Machinery
Pages562-571
Number of pages10
EditionJanuary
ISBN (Electronic)9781611973747
Publication statusPublished - Jan 1 2015
Event26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 - San Diego, United States
Duration: Jan 4 2015Jan 6 2015

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
NumberJanuary
Volume2015-January

Other

Other26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015
CountryUnited States
CitySan Diego
Period1/4/151/6/15

Fingerprint

Forestry
Factorization
Clustering algorithms
Clustering Algorithm
Strings
Linear-time Algorithm
Lyndon Words
Upper bound
Linearity
Repetition
Computing

All Science Journal Classification (ASJC) codes

  • Software
  • Mathematics(all)

Cite this

Bannai, H., I, T., Inenaga, S., Nakashima, Y., Takeda, M., & Tsuruta, K. (2015). A new characterization of maximal repetitions by Lyndon trees. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 (January ed., pp. 562-571). (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms; Vol. 2015-January, No. January). Association for Computing Machinery.

A new characterization of maximal repetitions by Lyndon trees. / Bannai, Hideo; I, Tomohiro; Inenaga, Shunsuke; Nakashima, Yuto; Takeda, Masayuki; Tsuruta, Kazuya.

Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015. January. ed. Association for Computing Machinery, 2015. p. 562-571 (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms; Vol. 2015-January, No. January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bannai, H, I, T, Inenaga, S, Nakashima, Y, Takeda, M & Tsuruta, K 2015, A new characterization of maximal repetitions by Lyndon trees. in Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015. January edn, Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, no. January, vol. 2015-January, Association for Computing Machinery, pp. 562-571, 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, United States, 1/4/15.
Bannai H, I T, Inenaga S, Nakashima Y, Takeda M, Tsuruta K. A new characterization of maximal repetitions by Lyndon trees. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015. January ed. Association for Computing Machinery. 2015. p. 562-571. (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms; January).
Bannai, Hideo ; I, Tomohiro ; Inenaga, Shunsuke ; Nakashima, Yuto ; Takeda, Masayuki ; Tsuruta, Kazuya. / A new characterization of maximal repetitions by Lyndon trees. Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015. January. ed. Association for Computing Machinery, 2015. pp. 562-571 (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms; January).
@inproceedings{a7fa91f24f7e40efa956a1b18b4381c1,
title = "A new characterization of maximal repetitions by Lyndon trees",
abstract = "We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.",
author = "Hideo Bannai and Tomohiro I and Shunsuke Inenaga and Yuto Nakashima and Masayuki Takeda and Kazuya Tsuruta",
year = "2015",
month = "1",
day = "1",
language = "English",
series = "Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms",
publisher = "Association for Computing Machinery",
number = "January",
pages = "562--571",
booktitle = "Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015",
edition = "January",

}

TY - GEN

T1 - A new characterization of maximal repetitions by Lyndon trees

AU - Bannai, Hideo

AU - I, Tomohiro

AU - Inenaga, Shunsuke

AU - Nakashima, Yuto

AU - Takeda, Masayuki

AU - Tsuruta, Kazuya

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.

AB - We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.

UR - http://www.scopus.com/inward/record.url?scp=84938221386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938221386&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

SP - 562

EP - 571

BT - Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015

PB - Association for Computing Machinery

ER -