TY - JOUR
T1 - Sampling strategy for protein complex prediction using cluster size frequency
AU - Tatsuke, Daisuke
AU - Maruyama, Osamu
N1 - Funding Information:
We would like to thank two anonymous reviewers for their helpful comments and questions on a draft of this paper. This work was partially supported by a grant from the Kyushu University Global COE (Centers of Excellence) Program , “Education-and-Research Hub for Mathematics-for-Industry,” from the Ministry of Education, Culture, Sports, Science, and Technology of Japan .
PY - 2013/4/10
Y1 - 2013/4/10
N2 - In this paper we propose a Markov chain Monte Carlo sampling method for predicting protein complexes from protein-protein interactions (PPIs). Many of the existing tools for this problem are designed more or less based on a density measure of a subgraph of the PPI network. This kind of measures is less effective for smaller complexes. On the other hand, it can be found that the number of complexes of a size in a database of protein complexes follows a power-law. Thus, most of the complexes are small-sized. For example, in CYC2008, a database of curated protein complexes of yeast, 42% of the complexes are heterodimeric, i.e., a complex consisting of two different proteins. In this work, we propose a protein complex prediction algorithm, called PPSampler (Proteins' Partition Sampler), which is designed based on the Metropolis-Hastings algorithm using a parameter representing a target value of the relative frequency of the number of predicted protein complexes of a particular size. In a performance comparison, PPSampler outperforms other existing algorithms. Furthermore, about half of the predicted clusters that are not matched with any known complexes in CYC2008 are statistically significant by Gene Ontology terms. Some of them can be expected to be true complexes.
AB - In this paper we propose a Markov chain Monte Carlo sampling method for predicting protein complexes from protein-protein interactions (PPIs). Many of the existing tools for this problem are designed more or less based on a density measure of a subgraph of the PPI network. This kind of measures is less effective for smaller complexes. On the other hand, it can be found that the number of complexes of a size in a database of protein complexes follows a power-law. Thus, most of the complexes are small-sized. For example, in CYC2008, a database of curated protein complexes of yeast, 42% of the complexes are heterodimeric, i.e., a complex consisting of two different proteins. In this work, we propose a protein complex prediction algorithm, called PPSampler (Proteins' Partition Sampler), which is designed based on the Metropolis-Hastings algorithm using a parameter representing a target value of the relative frequency of the number of predicted protein complexes of a particular size. In a performance comparison, PPSampler outperforms other existing algorithms. Furthermore, about half of the predicted clusters that are not matched with any known complexes in CYC2008 are statistically significant by Gene Ontology terms. Some of them can be expected to be true complexes.
UR - http://www.scopus.com/inward/record.url?scp=84875380841&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875380841&partnerID=8YFLogxK
U2 - 10.1016/j.gene.2012.11.050
DO - 10.1016/j.gene.2012.11.050
M3 - Article
C2 - 23235119
AN - SCOPUS:84875380841
VL - 518
SP - 152
EP - 158
JO - Gene
JF - Gene
SN - 0378-1119
IS - 1
ER -