TY - JOUR
T1 - RocSampler
T2 - regularizing overlapping protein complexes in protein-protein interaction networks
AU - Maruyama, Osamu
AU - Kuwahara, Yuki
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers JP26330330, JP17K00407. Publication costs were funded by JSPS KAKENHI Grant Number JP17K00407.
PY - 2017/12/6
Y1 - 2017/12/6
N2 - BACKGROUND: In recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously.RESULTS: In this paper, we propose a sampling-based protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushu-u.ac.jp/~om/software/RocSampler/ .CONCLUSIONS: We have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction.
AB - BACKGROUND: In recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously.RESULTS: In this paper, we propose a sampling-based protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushu-u.ac.jp/~om/software/RocSampler/ .CONCLUSIONS: We have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction.
UR - http://www.scopus.com/inward/record.url?scp=85043287122&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043287122&partnerID=8YFLogxK
U2 - 10.1186/s12859-017-1920-5
DO - 10.1186/s12859-017-1920-5
M3 - Article
C2 - 29244010
AN - SCOPUS:85043287122
SN - 1471-2105
VL - 18
SP - 491
JO - BMC Bioinformatics
JF - BMC Bioinformatics
M1 - 491
ER -