TY - JOUR

T1 - An O(N2) algorithm for discovering optimal Boolean pattern pairs

AU - Bannai, Hideo

AU - Hyyrö, Heikki

AU - Shinohara, Ayumi

AU - Takeda, Masayuki

AU - Nakai, Kenta

AU - Miyano, Satoru

N1 - Funding Information:
This work was supported in part by Grant-in-Aid for Encouragement of Young Scientists (B) and Grant-in-Aid for Scientific Research on Priority Areas (C) “Genome Biology” from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. Computational resources for the experiments were provided by The Human Genome Center Super Computer System at the Institute of Medical Science, University of Tokyo. The authors are also grateful to Dr. Seiya Imoto (Human Genome Center, Institute of Medical Science, University of Tokyo) for helpful comments concerning the scoring functions.

PY - 2004/10

Y1 - 2004/10

N2 - We consider the problem of finding the optimal combination of string patterns, which characterizes a given set of strings that have a numeric attribute value assigned to each string. Pattern combinations are scored based on the correlation between their occurrences in the strings and the numeric attribute values. The aim is to find the combination of patterns which is best with respect to an appropriate scoring function. We present an O(N2) time algorithm for finding the optimal pair of substring patterns combined with Boolean functions, where N is the total length of the sequences. The algorithm looks for all possible Boolean combinations of the patterns, e.g., patterns of the form p Λ ¬q, which indicates that the pattern pair Is considered to occur in a given string s, if p occurs in s, AND q does NOT occur in s. An efficient Implementation using suffix arrays is presented, and we further show that the algorithm can be adapted to find the best k-pattern Boolean combination in O(Nk) time. The algorithm is applied to mRNA sequence data sets of moderate size combined with their turnover rates for the purpose of finding regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing mRNA decay.

AB - We consider the problem of finding the optimal combination of string patterns, which characterizes a given set of strings that have a numeric attribute value assigned to each string. Pattern combinations are scored based on the correlation between their occurrences in the strings and the numeric attribute values. The aim is to find the combination of patterns which is best with respect to an appropriate scoring function. We present an O(N2) time algorithm for finding the optimal pair of substring patterns combined with Boolean functions, where N is the total length of the sequences. The algorithm looks for all possible Boolean combinations of the patterns, e.g., patterns of the form p Λ ¬q, which indicates that the pattern pair Is considered to occur in a given string s, if p occurs in s, AND q does NOT occur in s. An efficient Implementation using suffix arrays is presented, and we further show that the algorithm can be adapted to find the best k-pattern Boolean combination in O(Nk) time. The algorithm is applied to mRNA sequence data sets of moderate size combined with their turnover rates for the purpose of finding regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing mRNA decay.

UR - http://www.scopus.com/inward/record.url?scp=14744276878&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14744276878&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2004.36

DO - 10.1109/TCBB.2004.36

M3 - Article

C2 - 17051698

AN - SCOPUS:14744276878

SN - 1545-5963

VL - 1

SP - 159

EP - 170

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

IS - 4

ER -