Graph mining in chemoinformatics

Hiroto Saigo, Koji Tsuda

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Citations (Scopus)

Abstract

In standard QSAR (Quantitative Structure Activity Relationship) approaches, chemical compounds are represented as a set of physicochemical property descriptors, which are then used as numerical features for classification or regression. However, standard descriptors such as structural keys and fingerprints are not comprehensive enough in many cases. Since chemical compounds are naturally represented as attributed graphs, graph mining techniques allow us to create subgraph patterns (i.e., structural motifs) that can be used as additional descriptors. In this chapter, the authors present theoretically motivated QSAR algorithms that can automatically identify informative subgraph patterns. A graph mining subroutine is embedded in the mother algorithm and it is called repeatedly to collect patterns progressively. The authors present three variations that build on support vector machines (SVM), partial least squares regression (PLS) and least angle regression (LARS). In comparison to graph kernels, our methods are more interpretable, thereby allows chemists to identify salient subgraph features to improve the druglikeliness of lead compounds.

Original languageEnglish
Title of host publicationChemoinformatics and Advanced Machine Learning Perspectives
Subtitle of host publicationComplex Computational Methods and Collaborative Techniques
PublisherIGI Global
Pages95-128
Number of pages34
ISBN (Print)9781615209118
DOIs
Publication statusPublished - Dec 1 2010
Externally publishedYes

Fingerprint

quantitative structure-activity relationships
Quantitative Structure-Activity Relationship
chemical compounds
Dermatoglyphics
chemists
Least-Squares Analysis
least squares
physicochemical properties
seeds
methodology
support vector machines
Lead
Support Vector Machine

All Science Journal Classification (ASJC) codes

  • Agricultural and Biological Sciences(all)

Cite this

Saigo, H., & Tsuda, K. (2010). Graph mining in chemoinformatics. In Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques (pp. 95-128). IGI Global. https://doi.org/10.4018/978-1-61520-911-8.ch006

Graph mining in chemoinformatics. / Saigo, Hiroto; Tsuda, Koji.

Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques. IGI Global, 2010. p. 95-128.

Research output: Chapter in Book/Report/Conference proceedingChapter

Saigo, H & Tsuda, K 2010, Graph mining in chemoinformatics. in Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques. IGI Global, pp. 95-128. https://doi.org/10.4018/978-1-61520-911-8.ch006
Saigo H, Tsuda K. Graph mining in chemoinformatics. In Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques. IGI Global. 2010. p. 95-128 https://doi.org/10.4018/978-1-61520-911-8.ch006
Saigo, Hiroto ; Tsuda, Koji. / Graph mining in chemoinformatics. Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques. IGI Global, 2010. pp. 95-128
@inbook{1a6814edb01340ab96d1c60f387949c9,
title = "Graph mining in chemoinformatics",
abstract = "In standard QSAR (Quantitative Structure Activity Relationship) approaches, chemical compounds are represented as a set of physicochemical property descriptors, which are then used as numerical features for classification or regression. However, standard descriptors such as structural keys and fingerprints are not comprehensive enough in many cases. Since chemical compounds are naturally represented as attributed graphs, graph mining techniques allow us to create subgraph patterns (i.e., structural motifs) that can be used as additional descriptors. In this chapter, the authors present theoretically motivated QSAR algorithms that can automatically identify informative subgraph patterns. A graph mining subroutine is embedded in the mother algorithm and it is called repeatedly to collect patterns progressively. The authors present three variations that build on support vector machines (SVM), partial least squares regression (PLS) and least angle regression (LARS). In comparison to graph kernels, our methods are more interpretable, thereby allows chemists to identify salient subgraph features to improve the druglikeliness of lead compounds.",
author = "Hiroto Saigo and Koji Tsuda",
year = "2010",
month = "12",
day = "1",
doi = "10.4018/978-1-61520-911-8.ch006",
language = "English",
isbn = "9781615209118",
pages = "95--128",
booktitle = "Chemoinformatics and Advanced Machine Learning Perspectives",
publisher = "IGI Global",

}

TY - CHAP

T1 - Graph mining in chemoinformatics

AU - Saigo, Hiroto

AU - Tsuda, Koji

PY - 2010/12/1

Y1 - 2010/12/1

N2 - In standard QSAR (Quantitative Structure Activity Relationship) approaches, chemical compounds are represented as a set of physicochemical property descriptors, which are then used as numerical features for classification or regression. However, standard descriptors such as structural keys and fingerprints are not comprehensive enough in many cases. Since chemical compounds are naturally represented as attributed graphs, graph mining techniques allow us to create subgraph patterns (i.e., structural motifs) that can be used as additional descriptors. In this chapter, the authors present theoretically motivated QSAR algorithms that can automatically identify informative subgraph patterns. A graph mining subroutine is embedded in the mother algorithm and it is called repeatedly to collect patterns progressively. The authors present three variations that build on support vector machines (SVM), partial least squares regression (PLS) and least angle regression (LARS). In comparison to graph kernels, our methods are more interpretable, thereby allows chemists to identify salient subgraph features to improve the druglikeliness of lead compounds.

AB - In standard QSAR (Quantitative Structure Activity Relationship) approaches, chemical compounds are represented as a set of physicochemical property descriptors, which are then used as numerical features for classification or regression. However, standard descriptors such as structural keys and fingerprints are not comprehensive enough in many cases. Since chemical compounds are naturally represented as attributed graphs, graph mining techniques allow us to create subgraph patterns (i.e., structural motifs) that can be used as additional descriptors. In this chapter, the authors present theoretically motivated QSAR algorithms that can automatically identify informative subgraph patterns. A graph mining subroutine is embedded in the mother algorithm and it is called repeatedly to collect patterns progressively. The authors present three variations that build on support vector machines (SVM), partial least squares regression (PLS) and least angle regression (LARS). In comparison to graph kernels, our methods are more interpretable, thereby allows chemists to identify salient subgraph features to improve the druglikeliness of lead compounds.

UR - http://www.scopus.com/inward/record.url?scp=84862850892&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862850892&partnerID=8YFLogxK

U2 - 10.4018/978-1-61520-911-8.ch006

DO - 10.4018/978-1-61520-911-8.ch006

M3 - Chapter

AN - SCOPUS:84862850892

SN - 9781615209118

SP - 95

EP - 128

BT - Chemoinformatics and Advanced Machine Learning Perspectives

PB - IGI Global

ER -