Compressed automata for dictionary matching

I. Tomohiro, Takaaki Nishimoto, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

研究成果: ジャーナルへの寄稿記事

5 引用 (Scopus)

抄録

We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O(n2log N)-size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O(h+m) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O(n3log n log N) time and O(n2log N) space. In a spacial case where D represents only a single pattern, we present an O(n log N)-size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O(h) running time per character, and we show how to construct this representation in O(n3log n log N) time with O(n2log N) working space.

元の言語英語
ページ(範囲)30-41
ページ数12
ジャーナルTheoretical Computer Science
578
DOI
出版物ステータス出版済み - 5 1 2015

Fingerprint

Glossaries
Automata
Straight-line Programs
Matching Problem
Dictionary
Character

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

これを引用

Compressed automata for dictionary matching. / Tomohiro, I.; Nishimoto, Takaaki; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki.

:: Theoretical Computer Science, 巻 578, 01.05.2015, p. 30-41.

研究成果: ジャーナルへの寄稿記事

@article{f87dc3b99d264d6197426ea891774f55,
title = "Compressed automata for dictionary matching",
abstract = "We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O(n2log N)-size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O(h+m) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O(n3log n log N) time and O(n2log N) space. In a spacial case where D represents only a single pattern, we present an O(n log N)-size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O(h) running time per character, and we show how to construct this representation in O(n3log n log N) time with O(n2log N) working space.",
author = "I. Tomohiro and Takaaki Nishimoto and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda",
year = "2015",
month = "5",
day = "1",
doi = "10.1016/j.tcs.2015.01.019",
language = "English",
volume = "578",
pages = "30--41",
journal = "Theoretical Computer Science",
issn = "0304-3975",
publisher = "Elsevier",

}

TY - JOUR

T1 - Compressed automata for dictionary matching

AU - Tomohiro, I.

AU - Nishimoto, Takaaki

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Takeda, Masayuki

PY - 2015/5/1

Y1 - 2015/5/1

N2 - We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O(n2log N)-size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O(h+m) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O(n3log n log N) time and O(n2log N) space. In a spacial case where D represents only a single pattern, we present an O(n log N)-size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O(h) running time per character, and we show how to construct this representation in O(n3log n log N) time with O(n2log N) working space.

AB - We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O(n2log N)-size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O(h+m) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O(n3log n log N) time and O(n2log N) space. In a spacial case where D represents only a single pattern, we present an O(n log N)-size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O(h) running time per character, and we show how to construct this representation in O(n3log n log N) time with O(n2log N) working space.

UR - http://www.scopus.com/inward/record.url?scp=84951866541&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84951866541&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2015.01.019

DO - 10.1016/j.tcs.2015.01.019

M3 - Article

VL - 578

SP - 30

EP - 41

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

ER -