TY - GEN
T1 - On-line linear-time construction of word suffix trees
AU - Inenaga, Shunsuke
AU - Takeda, Masayuki
PY - 2006/1/1
Y1 - 2006/1/1
N2 - Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word auffix trees, which are one variation of sparse suffix trees, Let D be a dictionary of words and w be a string in D+, namely, ω is a sequence ω1 ⋯ ωk of k words in D. The word suffix tree of ω w.r.t. D is a path-compressed trie that represents only the k suffixes in the form of ωi ⋯ ωk- A typical example of its application is word- and phrase-level search on natural language documents. Andersson et al. proposed an algorithm to build word suffix trees in O(n) expected time with O(k) space, In this paper we present a new word suffix tree construction algorithm with O(n) running time and O(k) space in the worst cases. Our algorithm is on-line, which means that it can sequentially process the characters in the input, each by each, from left to right.
AB - Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word auffix trees, which are one variation of sparse suffix trees, Let D be a dictionary of words and w be a string in D+, namely, ω is a sequence ω1 ⋯ ωk of k words in D. The word suffix tree of ω w.r.t. D is a path-compressed trie that represents only the k suffixes in the form of ωi ⋯ ωk- A typical example of its application is word- and phrase-level search on natural language documents. Andersson et al. proposed an algorithm to build word suffix trees in O(n) expected time with O(k) space, In this paper we present a new word suffix tree construction algorithm with O(n) running time and O(k) space in the worst cases. Our algorithm is on-line, which means that it can sequentially process the characters in the input, each by each, from left to right.
UR - http://www.scopus.com/inward/record.url?scp=33746067513&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746067513&partnerID=8YFLogxK
U2 - 10.1007/11780441_7
DO - 10.1007/11780441_7
M3 - Conference contribution
AN - SCOPUS:33746067513
SN - 3540354557
SN - 9783540354550
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 60
EP - 71
BT - Combinatorial Pattern Matching - 17th Annual Symposium, CPM 2006, Proceedings
PB - Springer Verlag
T2 - 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006
Y2 - 5 July 2006 through 7 July 2006
ER -