Grammar Index by Induced Suffix Sorting

Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new compressed text index built upon a grammar compression based on induced suffix sorting [Nunes et al., DCC’18]. We show that this grammar exhibits a locality sensitive parsing property, which allows us to specify, given a pattern P, certain substrings of P, called cores, that are similarly parsed in the text grammar whenever these occurrences are extensible to occurrences of P. Supported by the cores, given a pattern of length m, we can locate all its occ occurrences in a text T of length n within O(mlg | S| + occ Clg | S| lg n+ occ ) time, where S is the set of all characters and non-terminals, occ is the number of occurrences, and occ C is the number of occurrences of a chosen core C of P in the right hand side of all production rules of the grammar of T. Our grammar index requires O(g) words of space and can be built in O(n) time using O(g) working space, where g is the sum of the lengths of the right hand sides of all production rules. We practically evaluate that our proposed index excels at locating long patterns in highly-repetitive texts. Our implementation is available at https://github.com/TooruAkagi/GCIS_Index.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Proceedings
EditorsThierry Lecroq, Hélène Touzet
PublisherSpringer Science and Business Media Deutschland GmbH
Pages85-99
Number of pages15
ISBN (Print)9783030866914
DOIs
Publication statusPublished - 2021
Event28th International Symposium on String Processing and Information Retrieval, SPIRE 2021 - Virtual, Online
Duration: Oct 4 2021Oct 6 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12944 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Symposium on String Processing and Information Retrieval, SPIRE 2021
CityVirtual, Online
Period10/4/2110/6/21

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Grammar Index by Induced Suffix Sorting'. Together they form a unique fingerprint.

Cite this