KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics

Masaaki Kotera, Yasuo Tabei, Yoshihiro Yamanishi, Yuki Moriya, Toshiaki Tokimatsu, Minoru Kanehisa, Susumu Goto

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. Methods: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. Results: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. Conclusions: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.

Original languageEnglish
Article numberS2
JournalBMC systems biology
Volume7
DOIs
Publication statusPublished - Jan 1 2013

Fingerprint

Interpretability
Bioinformatics
Substructure
Computational Biology
Prediction
Information Storage and Retrieval
Dermatoglyphics
Metabolic Networks and Pathways
Molecular Structure
Terminology
Cluster Analysis
Molecular Weight
Molecules
Functional groups
Datasets
Computational chemistry
Descriptors
Bioelectric potentials
Computational Chemistry
Molecular structure

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Modelling and Simulation
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

KCF-S : KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics. / Kotera, Masaaki; Tabei, Yasuo; Yamanishi, Yoshihiro; Moriya, Yuki; Tokimatsu, Toshiaki; Kanehisa, Minoru; Goto, Susumu.

In: BMC systems biology, Vol. 7, S2, 01.01.2013.

Research output: Contribution to journalArticle

Kotera, Masaaki ; Tabei, Yasuo ; Yamanishi, Yoshihiro ; Moriya, Yuki ; Tokimatsu, Toshiaki ; Kanehisa, Minoru ; Goto, Susumu. / KCF-S : KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics. In: BMC systems biology. 2013 ; Vol. 7.
@article{72ffa4db1c514ed3a4a6da92a19c19ab,
title = "KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics",
abstract = "Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. Methods: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. Results: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. Conclusions: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.",
author = "Masaaki Kotera and Yasuo Tabei and Yoshihiro Yamanishi and Yuki Moriya and Toshiaki Tokimatsu and Minoru Kanehisa and Susumu Goto",
year = "2013",
month = "1",
day = "1",
doi = "10.1186/1752-0509-7-S6-S2",
language = "English",
volume = "7",
journal = "BMC Systems Biology",
issn = "1752-0509",
publisher = "BioMed Central",

}

TY - JOUR

T1 - KCF-S

T2 - KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics

AU - Kotera, Masaaki

AU - Tabei, Yasuo

AU - Yamanishi, Yoshihiro

AU - Moriya, Yuki

AU - Tokimatsu, Toshiaki

AU - Kanehisa, Minoru

AU - Goto, Susumu

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. Methods: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. Results: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. Conclusions: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.

AB - Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. Methods: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. Results: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. Conclusions: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.

UR - http://www.scopus.com/inward/record.url?scp=84902497562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902497562&partnerID=8YFLogxK

U2 - 10.1186/1752-0509-7-S6-S2

DO - 10.1186/1752-0509-7-S6-S2

M3 - Article

C2 - 24564846

AN - SCOPUS:84902497562

VL - 7

JO - BMC Systems Biology

JF - BMC Systems Biology

SN - 1752-0509

M1 - S2

ER -