Linear-size CDAWG: New repetition-aware indexing and grammar compression

Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

9 被引用数 (Scopus)

抄録

In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n) -time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ) -time pattern matching. Here, ẽT is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T. The repetitiveness measure ẽT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ ) pattern matching time with O(eTr log n) bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, eTr is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT) for a given text T in O(n + ẽT log σ) time.

本文言語英語
ホスト出版物のタイトルString Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Proceedings
編集者Rossano Venturini, Gabriele Fici, Marinella Sciortino
出版社Springer Verlag
ページ304-316
ページ数13
ISBN(印刷版)9783319674278
DOI
出版ステータス出版済み - 1 1 2017
イベント24th International Symposium on String Processing and Information Retrieval, SPIRE 2017 - Palermo, イタリア
継続期間: 9 26 20179 29 2017

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
10508 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

その他

その他24th International Symposium on String Processing and Information Retrieval, SPIRE 2017
Countryイタリア
CityPalermo
Period9/26/179/29/17

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

フィンガープリント 「Linear-size CDAWG: New repetition-aware indexing and grammar compression」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル