### Abstract

We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabetΣ is a familyΩ of subsets of Σ. A motif π of type Ω is a stringπ=π_1…π_n of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S={(α^<(1)>, β^<(1)>),...,(α^<(m)>, β^<(m)>)} of pairs of strings inΣ with α^<(i)>≠β^<(i)> for 1≤i≤m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α^<(i)>,β^<(i)>) is good forπ if π accepts α^<(i)> and rejects β^<(i)>. We prove that the BCM problem is NP-complete even for a very simple type Ω_1={z|φ≠z⊆Σ}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω_∞=Ω_1∪{Σ+}∪{Σ^<(i, j)>|1≤i≤j}, whereΣ^<(i, j)> is the set of strings over Σ of length between i and j. Furthermore, for the BCM problem forΩ_1, we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

Original language | English |
---|---|

Pages (from-to) | 55-64 |

Number of pages | 10 |

Journal | IEICE technical report. Theoretical foundations of Computing |

Volume | 95 |

Issue number | 344 |

Publication status | Published - Oct 27 1995 |

## Fingerprint Dive into the research topics of 'Extracting Best Consensus Motifs from Positive and Negative Examples'. Together they form a unique fingerprint.

## Cite this

Tateishi, E., Maruyama, O., & Miyano, S. (1995). Extracting Best Consensus Motifs from Positive and Negative Examples.

*IEICE technical report. Theoretical foundations of Computing*,*95*(344), 55-64.