TY - GEN
T1 - Protein clustering on a Grassmann manifold
AU - Suryanto, Chendra Hadi
AU - Saigo, Hiroto
AU - Fukui, Kazuhiro
PY - 2012
Y1 - 2012
N2 - We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.
AB - We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.
UR - http://www.scopus.com/inward/record.url?scp=84868705139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84868705139&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-34123-6_7
DO - 10.1007/978-3-642-34123-6_7
M3 - Conference contribution
AN - SCOPUS:84868705139
SN - 9783642341229
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 71
EP - 81
BT - Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings
T2 - 7th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2012
Y2 - 8 November 2012 through 10 November 2012
ER -