Sparse compact directed acyclic word graphs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analysis. Word suffix trees are a variant of sparse suffix trees that are defined for strings that contain a special word delimiter #. Namely, the word suffix tree of string w = w 1w 2 · · ·w k, consisting of k words each ending with #, represents only the k suffixes of w of the form w i · · ·w k. Recently, we presented an algorithm which builds word suffix trees in O(n) time with O(k) space, where n is the length of w. In addition, we proposed sparse directed acyclic word graphs (SDAWGs) and an on-line algorithm for constructing them, working in O(n) time and space. As a further achievement of this research direction, this paper introduces yet a new text indexing structure named sparse compact directed acyclic word graphs (SCDAWGs). We show that the size of SCDAWGs is smaller than that of word suffix trees and SDAWGs, and present an SCDAWG construction algorithm that works in O(n) time with O(k) space and in an on-line manner.

Original languageEnglish
Title of host publicationProceedings of the Prague Stringology Conference '06
Pages197-211
Number of pages15
Publication statusPublished - 2006
EventPrague Stringology Conference '06, PSC 2006 - Prague, Czech Republic
Duration: Aug 28 2006Aug 30 2006

Other

OtherPrague Stringology Conference '06, PSC 2006
CountryCzech Republic
CityPrague
Period8/28/068/30/06

Fingerprint

Suffix Tree
Graph in graph theory
Suffix
K-space
Strings
Indexing
Text Indexing
Sequence Analysis
Pattern Matching
Natural Language
Subset

All Science Journal Classification (ASJC) codes

  • Mathematics(all)

Cite this

Inenaga, S., & Takeda, M. (2006). Sparse compact directed acyclic word graphs. In Proceedings of the Prague Stringology Conference '06 (pp. 197-211)

Sparse compact directed acyclic word graphs. / Inenaga, Shunsuke; Takeda, Masayuki.

Proceedings of the Prague Stringology Conference '06. 2006. p. 197-211.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Inenaga, S & Takeda, M 2006, Sparse compact directed acyclic word graphs. in Proceedings of the Prague Stringology Conference '06. pp. 197-211, Prague Stringology Conference '06, PSC 2006, Prague, Czech Republic, 8/28/06.
Inenaga S, Takeda M. Sparse compact directed acyclic word graphs. In Proceedings of the Prague Stringology Conference '06. 2006. p. 197-211
Inenaga, Shunsuke ; Takeda, Masayuki. / Sparse compact directed acyclic word graphs. Proceedings of the Prague Stringology Conference '06. 2006. pp. 197-211
@inproceedings{e8f60e2739474cb68483a4d4f43d8373,
title = "Sparse compact directed acyclic word graphs",
abstract = "The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analysis. Word suffix trees are a variant of sparse suffix trees that are defined for strings that contain a special word delimiter #. Namely, the word suffix tree of string w = w 1w 2 · · ·w k, consisting of k words each ending with #, represents only the k suffixes of w of the form w i · · ·w k. Recently, we presented an algorithm which builds word suffix trees in O(n) time with O(k) space, where n is the length of w. In addition, we proposed sparse directed acyclic word graphs (SDAWGs) and an on-line algorithm for constructing them, working in O(n) time and space. As a further achievement of this research direction, this paper introduces yet a new text indexing structure named sparse compact directed acyclic word graphs (SCDAWGs). We show that the size of SCDAWGs is smaller than that of word suffix trees and SDAWGs, and present an SCDAWG construction algorithm that works in O(n) time with O(k) space and in an on-line manner.",
author = "Shunsuke Inenaga and Masayuki Takeda",
year = "2006",
language = "English",
isbn = "8001035336",
pages = "197--211",
booktitle = "Proceedings of the Prague Stringology Conference '06",

}

TY - GEN

T1 - Sparse compact directed acyclic word graphs

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2006

Y1 - 2006

N2 - The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analysis. Word suffix trees are a variant of sparse suffix trees that are defined for strings that contain a special word delimiter #. Namely, the word suffix tree of string w = w 1w 2 · · ·w k, consisting of k words each ending with #, represents only the k suffixes of w of the form w i · · ·w k. Recently, we presented an algorithm which builds word suffix trees in O(n) time with O(k) space, where n is the length of w. In addition, we proposed sparse directed acyclic word graphs (SDAWGs) and an on-line algorithm for constructing them, working in O(n) time and space. As a further achievement of this research direction, this paper introduces yet a new text indexing structure named sparse compact directed acyclic word graphs (SCDAWGs). We show that the size of SCDAWGs is smaller than that of word suffix trees and SDAWGs, and present an SCDAWG construction algorithm that works in O(n) time with O(k) space and in an on-line manner.

AB - The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analysis. Word suffix trees are a variant of sparse suffix trees that are defined for strings that contain a special word delimiter #. Namely, the word suffix tree of string w = w 1w 2 · · ·w k, consisting of k words each ending with #, represents only the k suffixes of w of the form w i · · ·w k. Recently, we presented an algorithm which builds word suffix trees in O(n) time with O(k) space, where n is the length of w. In addition, we proposed sparse directed acyclic word graphs (SDAWGs) and an on-line algorithm for constructing them, working in O(n) time and space. As a further achievement of this research direction, this paper introduces yet a new text indexing structure named sparse compact directed acyclic word graphs (SCDAWGs). We show that the size of SCDAWGs is smaller than that of word suffix trees and SDAWGs, and present an SCDAWG construction algorithm that works in O(n) time with O(k) space and in an on-line manner.

UR - http://www.scopus.com/inward/record.url?scp=37849020017&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=37849020017&partnerID=8YFLogxK

M3 - Conference contribution

SN - 8001035336

SN - 9788001035337

SP - 197

EP - 211

BT - Proceedings of the Prague Stringology Conference '06

ER -