Position Heaps for Cartesian-Tree Matching on Strings and Tries

Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Cartesian-tree pattern matching is a recently introduced scheme of pattern matching that detects fragments in a sequential data stream which have a similar structure as a query pattern. Formally, Cartesian-tree pattern matching seeks all substrings S of the text string S such that the Cartesian tree of S and that of a query pattern P coincide. In this paper, we present a new indexing structure for this problem, called the Cartesian-tree Position Heap (CPH). Let n be the length of the input text string S, m the length of a query pattern P, and σ the alphabet size. We show that the CPH of S, denoted CPH(S), supports pattern matching queries in O(m(σ+ log (min { h, m} ) ) + occ) time with O(n) space, where h is the height of the CPH and occ is the number of pattern occurrences. We show how to build CPH(S) in O(nlog σ) time with O(n) working space. Further, we extend the problem to the case where the text is a labeled tree (i.e. a trie). Given a trie T with N nodes, we show that the CPH of T, denoted CPH(T), supports pattern matching queries on the trie in O(m(σ2+ log (min { h, m} ) ) + occ) time with O(Nσ) space. We also show a construction algorithm for CPH(T) running in O(Nσ) time and O(Nσ) working space.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Proceedings
EditorsThierry Lecroq, Hélène Touzet
PublisherSpringer Science and Business Media Deutschland GmbH
Pages241-254
Number of pages14
ISBN (Print)9783030866914
DOIs
Publication statusPublished - 2021
Event28th International Symposium on String Processing and Information Retrieval, SPIRE 2021 - Virtual, Online
Duration: Oct 4 2021Oct 6 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12944 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Symposium on String Processing and Information Retrieval, SPIRE 2021
CityVirtual, Online
Period10/4/2110/6/21

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Position Heaps for Cartesian-Tree Matching on Strings and Tries'. Together they form a unique fingerprint.

Cite this