Finding optimal pairs of patterns

Hideo Bannai, Heikki Hyyrö, Ayumi Shinohara, Masayuki Takeda, Kenta Nakai, Satoru Miyano

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N 2) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p ∧ ¬ q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.

Original languageEnglish
Pages (from-to)450-462
Number of pages13
JournalLecture Notes in Computer Science
Volume3240
Publication statusPublished - 2004

Fingerprint

Strings
Microarrays
Nucleotides
Numerics
Gene expression
Attribute
Suffix Array
Gene Expression
Gene Expression Data
Microarray Data
Efficient Implementation
Scoring
Genomics
Complement
benzoylprop-ethyl
Datasets

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., & Miyano, S. (2004). Finding optimal pairs of patterns. Lecture Notes in Computer Science, 3240, 450-462.

Finding optimal pairs of patterns. / Bannai, Hideo; Hyyrö, Heikki; Shinohara, Ayumi; Takeda, Masayuki; Nakai, Kenta; Miyano, Satoru.

In: Lecture Notes in Computer Science, Vol. 3240, 2004, p. 450-462.

Research output: Contribution to journalArticle

Bannai, H, Hyyrö, H, Shinohara, A, Takeda, M, Nakai, K & Miyano, S 2004, 'Finding optimal pairs of patterns', Lecture Notes in Computer Science, vol. 3240, pp. 450-462.
Bannai H, Hyyrö H, Shinohara A, Takeda M, Nakai K, Miyano S. Finding optimal pairs of patterns. Lecture Notes in Computer Science. 2004;3240:450-462.
Bannai, Hideo ; Hyyrö, Heikki ; Shinohara, Ayumi ; Takeda, Masayuki ; Nakai, Kenta ; Miyano, Satoru. / Finding optimal pairs of patterns. In: Lecture Notes in Computer Science. 2004 ; Vol. 3240. pp. 450-462.
@article{0b5975afc59d4e75a4c1b41eaeee67db,
title = "Finding optimal pairs of patterns",
abstract = "We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N 2) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p ∧ ¬ q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.",
author = "Hideo Bannai and Heikki Hyyr{\"o} and Ayumi Shinohara and Masayuki Takeda and Kenta Nakai and Satoru Miyano",
year = "2004",
language = "English",
volume = "3240",
pages = "450--462",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Finding optimal pairs of patterns

AU - Bannai, Hideo

AU - Hyyrö, Heikki

AU - Shinohara, Ayumi

AU - Takeda, Masayuki

AU - Nakai, Kenta

AU - Miyano, Satoru

PY - 2004

Y1 - 2004

N2 - We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N 2) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p ∧ ¬ q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.

AB - We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N 2) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p ∧ ¬ q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.

UR - http://www.scopus.com/inward/record.url?scp=35048820942&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35048820942&partnerID=8YFLogxK

M3 - Article

VL - 3240

SP - 450

EP - 462

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -