Best fitting fixed-length substring patterns for a set of strings

Hirotaka Ono, Yen Kaow Ng

研究成果: Contribution to journalConference article査読

3 被引用数 (Scopus)

抄録

Finding a pattern, or a set of patterns that best characterizes a set of strings is considered important in the context of Knowledge Discovery as applied in Molecular Biology. Our main objective is to address the problem of "over-generalization", which is the phenomenon that a characterization is so general that it potentially includes many incorrect examples. To overcome this we formally define a criteria for a most fitting language for a set of strings, via a natural notion of density. We show how the problem can be solved by solving the membership problem and counting problem, and we study the runtime complexities of the problem with respect to three solution spaces derived from unions of the languages generated from fixed-length substring patterns. Two of these we show to be solvable in time polynomial to the input size. In the third case, however, the problem turns out to be NP-complete.

本文言語英語
ページ(範囲)240-250
ページ数11
ジャーナルLecture Notes in Computer Science
3595
出版ステータス出版済み - 10 24 2005
イベント11th Annual International Conference on Computing and Combinatorics, COCOON 2005 - Kunming, 中国
継続期間: 8 16 20058 29 2005

All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Best fitting fixed-length substring patterns for a set of strings」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル