Computing DAWGs and minimal absent words in linear time for integer alphabets

Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n +MAW(y)) time and O(n) working space for integer alphabets.

Original languageEnglish
Title of host publication41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016
EditorsAnca Muscholl, Piotr Faliszewski, Rolf Niedermeier
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959770163
DOIs
Publication statusPublished - Aug 1 2016
Event41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016 - Krakow, Poland
Duration: Aug 22 2016Aug 26 2016

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume58
ISSN (Print)1868-8969

Other

Other41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016
CountryPoland
CityKrakow
Period8/22/168/26/16

Fingerprint

Polynomials

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Fujishige, Y., Tsujimaru, Y., Inenaga, S., Bannai, H., & Takeda, M. (2016). Computing DAWGs and minimal absent words in linear time for integer alphabets. In A. Muscholl, P. Faliszewski, & R. Niedermeier (Eds.), 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016 [38] (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 58). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.MFCS.2016.38

Computing DAWGs and minimal absent words in linear time for integer alphabets. / Fujishige, Yuta; Tsujimaru, Yuki; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki.

41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016. ed. / Anca Muscholl; Piotr Faliszewski; Rolf Niedermeier. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2016. 38 (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 58).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fujishige, Y, Tsujimaru, Y, Inenaga, S, Bannai, H & Takeda, M 2016, Computing DAWGs and minimal absent words in linear time for integer alphabets. in A Muscholl, P Faliszewski & R Niedermeier (eds), 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016., 38, Leibniz International Proceedings in Informatics, LIPIcs, vol. 58, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016, Krakow, Poland, 8/22/16. https://doi.org/10.4230/LIPIcs.MFCS.2016.38
Fujishige Y, Tsujimaru Y, Inenaga S, Bannai H, Takeda M. Computing DAWGs and minimal absent words in linear time for integer alphabets. In Muscholl A, Faliszewski P, Niedermeier R, editors, 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2016. 38. (Leibniz International Proceedings in Informatics, LIPIcs). https://doi.org/10.4230/LIPIcs.MFCS.2016.38
Fujishige, Yuta ; Tsujimaru, Yuki ; Inenaga, Shunsuke ; Bannai, Hideo ; Takeda, Masayuki. / Computing DAWGs and minimal absent words in linear time for integer alphabets. 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016. editor / Anca Muscholl ; Piotr Faliszewski ; Rolf Niedermeier. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2016. (Leibniz International Proceedings in Informatics, LIPIcs).
@inproceedings{108cb1326cde4de1bec57fe0ff262195,
title = "Computing DAWGs and minimal absent words in linear time for integer alphabets",
abstract = "The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n +MAW(y)) time and O(n) working space for integer alphabets.",
author = "Yuta Fujishige and Yuki Tsujimaru and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda",
year = "2016",
month = "8",
day = "1",
doi = "10.4230/LIPIcs.MFCS.2016.38",
language = "English",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Anca Muscholl and Piotr Faliszewski and Rolf Niedermeier",
booktitle = "41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016",

}

TY - GEN

T1 - Computing DAWGs and minimal absent words in linear time for integer alphabets

AU - Fujishige, Yuta

AU - Tsujimaru, Yuki

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Takeda, Masayuki

PY - 2016/8/1

Y1 - 2016/8/1

N2 - The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n +MAW(y)) time and O(n) working space for integer alphabets.

AB - The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n +MAW(y)) time and O(n) working space for integer alphabets.

UR - http://www.scopus.com/inward/record.url?scp=85012877585&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85012877585&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.MFCS.2016.38

DO - 10.4230/LIPIcs.MFCS.2016.38

M3 - Conference contribution

AN - SCOPUS:85012877585

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016

A2 - Muscholl, Anca

A2 - Faliszewski, Piotr

A2 - Niedermeier, Rolf

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -