Protein clustering on a Grassmann manifold

Chendra Hadi Suryanto, Hiroto Saigo, Kazuhiro Fukui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.

Original languageEnglish
Title of host publicationPattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings
Pages71-81
Number of pages11
DOIs
Publication statusPublished - Nov 16 2012
Externally publishedYes
Event7th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2012 - Tokyo, Japan
Duration: Nov 8 2012Nov 10 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7632 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other7th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2012
CountryJapan
CityTokyo
Period11/8/1211/10/12

Fingerprint

Grassmann Manifold
Clustering
Proteins
Protein
Protein Structure
Subspace
Grassmannian
Clustering Methods
Alignment
K-means Clustering
Similarity Measure
Set of points
Descriptors
Gauss
Carbon
Angle
Evaluate

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Suryanto, C. H., Saigo, H., & Fukui, K. (2012). Protein clustering on a Grassmann manifold. In Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings (pp. 71-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7632 LNBI). https://doi.org/10.1007/978-3-642-34123-6_7

Protein clustering on a Grassmann manifold. / Suryanto, Chendra Hadi; Saigo, Hiroto; Fukui, Kazuhiro.

Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings. 2012. p. 71-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7632 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Suryanto, CH, Saigo, H & Fukui, K 2012, Protein clustering on a Grassmann manifold. in Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7632 LNBI, pp. 71-81, 7th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2012, Tokyo, Japan, 11/8/12. https://doi.org/10.1007/978-3-642-34123-6_7
Suryanto CH, Saigo H, Fukui K. Protein clustering on a Grassmann manifold. In Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings. 2012. p. 71-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-34123-6_7
Suryanto, Chendra Hadi ; Saigo, Hiroto ; Fukui, Kazuhiro. / Protein clustering on a Grassmann manifold. Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings. 2012. pp. 71-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e62c502c85d041b7afdbcc0e1747aca5,
title = "Protein clustering on a Grassmann manifold",
abstract = "We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.",
author = "Suryanto, {Chendra Hadi} and Hiroto Saigo and Kazuhiro Fukui",
year = "2012",
month = "11",
day = "16",
doi = "10.1007/978-3-642-34123-6_7",
language = "English",
isbn = "9783642341229",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "71--81",
booktitle = "Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings",

}

TY - GEN

T1 - Protein clustering on a Grassmann manifold

AU - Suryanto, Chendra Hadi

AU - Saigo, Hiroto

AU - Fukui, Kazuhiro

PY - 2012/11/16

Y1 - 2012/11/16

N2 - We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.

AB - We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.

UR - http://www.scopus.com/inward/record.url?scp=84868705139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84868705139&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-34123-6_7

DO - 10.1007/978-3-642-34123-6_7

M3 - Conference contribution

SN - 9783642341229

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 71

EP - 81

BT - Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings

ER -