TY - GEN

T1 - Extracting best consensus motifs from positive and negative examples

AU - Tateishi, Erika

AU - Maruyama, Osamu

AU - Miyano, Satoru

N1 - Publisher Copyright:
© 1996, Springer Verlag. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

PY - 1996

Y1 - 1996

N2 - We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ *. A motif π of type Ω is a string π=π1 ... πn of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S={(α (1),β(1),..., (α(m),β(m))} of pairs of strings in Σ* with α (i) ≠β(i) for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α (i), β (i)) is good for π if π accepts α (i) and rejects β (i) We prove that the BCM problem is NP-complete even for a very simple type (Formula presented), which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω ∞=Ω1∪ {Σ+}∪{Σ[i,j]1≤i≤ j}, where Σ [i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω 1 we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

AB - We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ *. A motif π of type Ω is a string π=π1 ... πn of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S={(α (1),β(1),..., (α(m),β(m))} of pairs of strings in Σ* with α (i) ≠β(i) for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α (i), β (i)) is good for π if π accepts α (i) and rejects β (i) We prove that the BCM problem is NP-complete even for a very simple type (Formula presented), which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω ∞=Ω1∪ {Σ+}∪{Σ[i,j]1≤i≤ j}, where Σ [i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω 1 we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84948184174&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948184174&partnerID=8YFLogxK

U2 - 10.1007/3-540-60922-9_19

DO - 10.1007/3-540-60922-9_19

M3 - Conference contribution

AN - SCOPUS:84948184174

SN - 9783540609223

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 219

EP - 230

BT - STACS 1996 - 13th Annual Symposium on Theoretical Aspects of Computer Science, Proceedings

A2 - Puech, Claude

A2 - Reischuk, Rudiger

PB - Springer Verlag

T2 - 13th Annual Symposium on Theoretical Aspects of Computer Science, STACS 1996

Y2 - 22 February 1996 through 24 February 1996

ER -