Grammar Index by Induced Suffix Sorting

Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

抄録

We propose a new compressed text index built upon a grammar compression based on induced suffix sorting [Nunes et al., DCC’18]. We show that this grammar exhibits a locality sensitive parsing property, which allows us to specify, given a pattern P, certain substrings of P, called cores, that are similarly parsed in the text grammar whenever these occurrences are extensible to occurrences of P. Supported by the cores, given a pattern of length m, we can locate all its occ occurrences in a text T of length n within O(mlg | S| + occ Clg | S| lg n+ occ ) time, where S is the set of all characters and non-terminals, occ is the number of occurrences, and occ C is the number of occurrences of a chosen core C of P in the right hand side of all production rules of the grammar of T. Our grammar index requires O(g) words of space and can be built in O(n) time using O(g) working space, where g is the sum of the lengths of the right hand sides of all production rules. We practically evaluate that our proposed index excels at locating long patterns in highly-repetitive texts. Our implementation is available at https://github.com/TooruAkagi/GCIS_Index.

本文言語英語
ホスト出版物のタイトルString Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Proceedings
編集者Thierry Lecroq, Hélène Touzet
出版社Springer Science and Business Media Deutschland GmbH
ページ85-99
ページ数15
ISBN(印刷版)9783030866914
DOI
出版ステータス出版済み - 2021
イベント28th International Symposium on String Processing and Information Retrieval, SPIRE 2021 - Virtual, Online
継続期間: 10 4 202110 6 2021

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12944 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

会議

会議28th International Symposium on String Processing and Information Retrieval, SPIRE 2021
CityVirtual, Online
Period10/4/2110/6/21

All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Grammar Index by Induced Suffix Sorting」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル