Mining Discriminative Patterns from Graph Data with Multiple Labels and Its Application to Quantitative Structure-Activity Relationship (QSAR) Models

Zheng Shao, Yuya Hirayama, Yoshihiro Yamanishi, Hiroto Saigo

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Graph data are becoming increasingly common in machine learning and data mining, and its application field pervades to bioinformatics and cheminformatics. Accordingly, as a method to extract patterns from graph data, graph mining recently has been studied and developed rapidly. Since the number of patterns in graph data is huge, a central issue is how to efficiently collect informative patterns suitable for subsequent tasks such as classification or regression. In this paper, we consider mining discriminative subgraphs from graph data with multiple labels. The resulting task has important applications in cheminformatics, such as finding common functional groups that trigger multiple drug side effects, or identifying ligand functional groups that hit multiple targets. In computational experiments, we first verify the effectiveness of the proposed approach in synthetic data, then we apply it to drug adverse effect prediction problem. In the latter dataset, we compared the proposed method with L1-norm logistic regression in combination with the PubChem/Open Babel fingerprint, in that the proposed method showed superior performance with a much smaller number of subgraph patterns. Software is available from https://github.com/axot/GLP.

Original languageEnglish
Pages (from-to)2519-2527
Number of pages9
JournalJournal of Chemical Information and Modeling
Volume55
Issue number12
DOIs
Publication statusPublished - Dec 28 2015

Fingerprint

activity structure
Functional groups
Labels
Bioinformatics
Pharmaceutical Preparations
Data mining
Learning systems
Logistics
Ligands
drug
regression
Experiments
Group
logistics
experiment
learning
performance

All Science Journal Classification (ASJC) codes

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Mining Discriminative Patterns from Graph Data with Multiple Labels and Its Application to Quantitative Structure-Activity Relationship (QSAR) Models. / Shao, Zheng; Hirayama, Yuya; Yamanishi, Yoshihiro; Saigo, Hiroto.

In: Journal of Chemical Information and Modeling, Vol. 55, No. 12, 28.12.2015, p. 2519-2527.

Research output: Contribution to journalArticle

@article{3032b979b6604047a4df903c47d82995,
title = "Mining Discriminative Patterns from Graph Data with Multiple Labels and Its Application to Quantitative Structure-Activity Relationship (QSAR) Models",
abstract = "Graph data are becoming increasingly common in machine learning and data mining, and its application field pervades to bioinformatics and cheminformatics. Accordingly, as a method to extract patterns from graph data, graph mining recently has been studied and developed rapidly. Since the number of patterns in graph data is huge, a central issue is how to efficiently collect informative patterns suitable for subsequent tasks such as classification or regression. In this paper, we consider mining discriminative subgraphs from graph data with multiple labels. The resulting task has important applications in cheminformatics, such as finding common functional groups that trigger multiple drug side effects, or identifying ligand functional groups that hit multiple targets. In computational experiments, we first verify the effectiveness of the proposed approach in synthetic data, then we apply it to drug adverse effect prediction problem. In the latter dataset, we compared the proposed method with L1-norm logistic regression in combination with the PubChem/Open Babel fingerprint, in that the proposed method showed superior performance with a much smaller number of subgraph patterns. Software is available from https://github.com/axot/GLP.",
author = "Zheng Shao and Yuya Hirayama and Yoshihiro Yamanishi and Hiroto Saigo",
year = "2015",
month = "12",
day = "28",
doi = "10.1021/acs.jcim.5b00376",
language = "English",
volume = "55",
pages = "2519--2527",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "12",

}

TY - JOUR

T1 - Mining Discriminative Patterns from Graph Data with Multiple Labels and Its Application to Quantitative Structure-Activity Relationship (QSAR) Models

AU - Shao, Zheng

AU - Hirayama, Yuya

AU - Yamanishi, Yoshihiro

AU - Saigo, Hiroto

PY - 2015/12/28

Y1 - 2015/12/28

N2 - Graph data are becoming increasingly common in machine learning and data mining, and its application field pervades to bioinformatics and cheminformatics. Accordingly, as a method to extract patterns from graph data, graph mining recently has been studied and developed rapidly. Since the number of patterns in graph data is huge, a central issue is how to efficiently collect informative patterns suitable for subsequent tasks such as classification or regression. In this paper, we consider mining discriminative subgraphs from graph data with multiple labels. The resulting task has important applications in cheminformatics, such as finding common functional groups that trigger multiple drug side effects, or identifying ligand functional groups that hit multiple targets. In computational experiments, we first verify the effectiveness of the proposed approach in synthetic data, then we apply it to drug adverse effect prediction problem. In the latter dataset, we compared the proposed method with L1-norm logistic regression in combination with the PubChem/Open Babel fingerprint, in that the proposed method showed superior performance with a much smaller number of subgraph patterns. Software is available from https://github.com/axot/GLP.

AB - Graph data are becoming increasingly common in machine learning and data mining, and its application field pervades to bioinformatics and cheminformatics. Accordingly, as a method to extract patterns from graph data, graph mining recently has been studied and developed rapidly. Since the number of patterns in graph data is huge, a central issue is how to efficiently collect informative patterns suitable for subsequent tasks such as classification or regression. In this paper, we consider mining discriminative subgraphs from graph data with multiple labels. The resulting task has important applications in cheminformatics, such as finding common functional groups that trigger multiple drug side effects, or identifying ligand functional groups that hit multiple targets. In computational experiments, we first verify the effectiveness of the proposed approach in synthetic data, then we apply it to drug adverse effect prediction problem. In the latter dataset, we compared the proposed method with L1-norm logistic regression in combination with the PubChem/Open Babel fingerprint, in that the proposed method showed superior performance with a much smaller number of subgraph patterns. Software is available from https://github.com/axot/GLP.

UR - http://www.scopus.com/inward/record.url?scp=84952794539&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952794539&partnerID=8YFLogxK

U2 - 10.1021/acs.jcim.5b00376

DO - 10.1021/acs.jcim.5b00376

M3 - Article

C2 - 26549421

AN - SCOPUS:84952794539

VL - 55

SP - 2519

EP - 2527

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 12

ER -