TY - GEN

T1 - Discovering best variable-length-don’t-care patterns

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Shinohara, Ayumi

AU - Takeda, Masayuki

AU - Arikawa, Setsuo

PY - 2002

Y1 - 2002

N2 - A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{∗})∗, where Σ is an alphabet and ∗ is a wildcard matching any string in Σ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair ‹q, k›, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.

AB - A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{∗})∗, where Σ is an alphabet and ∗ is a wildcard matching any string in Σ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair ‹q, k›, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.

UR - http://www.scopus.com/inward/record.url?scp=84949815829&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949815829&partnerID=8YFLogxK

U2 - 10.1007/3-540-36182-0_10

DO - 10.1007/3-540-36182-0_10

M3 - Conference contribution

AN - SCOPUS:84949815829

SN - 3540001883

SN - 9783540001881

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 86

EP - 97

BT - Discovery Science - 5th International Conference, DS 2002, Proceedings

A2 - Lange, Steffen

A2 - Satoh, Ken

A2 - Smith, Carl H.

PB - Springer Verlag

T2 - 5th International Conference on Discovery Science, DS 2002

Y2 - 24 November 2002 through 26 November 2002

ER -