Linear-size CDAWG: New repetition-aware indexing and grammar compression

Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n) -time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ) -time pattern matching. Here, ẽT is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T. The repetitiveness measure ẽT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ ) pattern matching time with O(eTr log n) bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, eTr is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT) for a given text T in O(n + ẽT log σ) time.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Proceedings
EditorsRossano Venturini, Gabriele Fici, Marinella Sciortino
PublisherSpringer Verlag
Pages304-316
Number of pages13
ISBN (Print)9783319674278
DOIs
Publication statusPublished - Jan 1 2017
Event24th International Symposium on String Processing and Information Retrieval, SPIRE 2017 - Palermo, Italy
Duration: Sep 26 2017Sep 29 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10508 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other24th International Symposium on String Processing and Information Retrieval, SPIRE 2017
CountryItaly
CityPalermo
Period9/26/179/29/17

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Linear-size CDAWG: New repetition-aware indexing and grammar compression'. Together they form a unique fingerprint.

  • Cite this

    Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., & Arimura, H. (2017). Linear-size CDAWG: New repetition-aware indexing and grammar compression. In R. Venturini, G. Fici, & M. Sciortino (Eds.), String Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Proceedings (pp. 304-316). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10508 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-67428-5_26