Linear-size CDAWG: New repetition-aware indexing and grammar compression

Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

9 被引用数 (Scopus)


In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n) -time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ) -time pattern matching. Here, ẽT is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T. The repetitiveness measure ẽT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ ) pattern matching time with O(eTr log n) bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, eTr is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT) for a given text T in O(n + ẽT log σ) time.

ホスト出版物のタイトルString Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Proceedings
編集者Rossano Venturini, Gabriele Fici, Marinella Sciortino
出版社Springer Verlag
出版ステータス出版済み - 2017
イベント24th International Symposium on String Processing and Information Retrieval, SPIRE 2017 - Palermo, イタリア
継続期間: 9 26 20179 29 2017


名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
10508 LNCS


その他24th International Symposium on String Processing and Information Retrieval, SPIRE 2017

All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)


「Linear-size CDAWG: New repetition-aware indexing and grammar compression」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。