KEGG OC: A large-scale automatic construction of taxonomy-based ortholog clusters

Akihiro Nakaya, Toshiaki Katayama, Masumi Itoh, Kazushi Hiranuka, Shuichi Kawashima, Yuki Moriya, Shujiro Okuda, Michihiro Tanaka, Toshiaki Tokimatsu, Yoshihiro Yamanishi, Akiyasu C. Yoshizawa, Minoru Kanehisa, Susumu Goto

Research output: Contribution to journalArticle

59 Citations (Scopus)

Abstract

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.

Original languageEnglish
Pages (from-to)D353-D357
JournalNucleic acids research
Volume41
Issue numberD1
DOIs
Publication statusPublished - Jan 1 2013

Fingerprint

Genome
Databases
Cluster Analysis
Genes
Archaea
Eukaryota
Amino Acid Sequence
Bacteria
Pharmaceutical Preparations
Proteins

All Science Journal Classification (ASJC) codes

  • Genetics

Cite this

Nakaya, A., Katayama, T., Itoh, M., Hiranuka, K., Kawashima, S., Moriya, Y., ... Goto, S. (2013). KEGG OC: A large-scale automatic construction of taxonomy-based ortholog clusters. Nucleic acids research, 41(D1), D353-D357. https://doi.org/10.1093/nar/gks1239

KEGG OC : A large-scale automatic construction of taxonomy-based ortholog clusters. / Nakaya, Akihiro; Katayama, Toshiaki; Itoh, Masumi; Hiranuka, Kazushi; Kawashima, Shuichi; Moriya, Yuki; Okuda, Shujiro; Tanaka, Michihiro; Tokimatsu, Toshiaki; Yamanishi, Yoshihiro; Yoshizawa, Akiyasu C.; Kanehisa, Minoru; Goto, Susumu.

In: Nucleic acids research, Vol. 41, No. D1, 01.01.2013, p. D353-D357.

Research output: Contribution to journalArticle

Nakaya, A, Katayama, T, Itoh, M, Hiranuka, K, Kawashima, S, Moriya, Y, Okuda, S, Tanaka, M, Tokimatsu, T, Yamanishi, Y, Yoshizawa, AC, Kanehisa, M & Goto, S 2013, 'KEGG OC: A large-scale automatic construction of taxonomy-based ortholog clusters', Nucleic acids research, vol. 41, no. D1, pp. D353-D357. https://doi.org/10.1093/nar/gks1239
Nakaya A, Katayama T, Itoh M, Hiranuka K, Kawashima S, Moriya Y et al. KEGG OC: A large-scale automatic construction of taxonomy-based ortholog clusters. Nucleic acids research. 2013 Jan 1;41(D1):D353-D357. https://doi.org/10.1093/nar/gks1239
Nakaya, Akihiro ; Katayama, Toshiaki ; Itoh, Masumi ; Hiranuka, Kazushi ; Kawashima, Shuichi ; Moriya, Yuki ; Okuda, Shujiro ; Tanaka, Michihiro ; Tokimatsu, Toshiaki ; Yamanishi, Yoshihiro ; Yoshizawa, Akiyasu C. ; Kanehisa, Minoru ; Goto, Susumu. / KEGG OC : A large-scale automatic construction of taxonomy-based ortholog clusters. In: Nucleic acids research. 2013 ; Vol. 41, No. D1. pp. D353-D357.
@article{6466a8bdf56240b9a9bf33ee942ae70f,
title = "KEGG OC: A large-scale automatic construction of taxonomy-based ortholog clusters",
abstract = "The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.",
author = "Akihiro Nakaya and Toshiaki Katayama and Masumi Itoh and Kazushi Hiranuka and Shuichi Kawashima and Yuki Moriya and Shujiro Okuda and Michihiro Tanaka and Toshiaki Tokimatsu and Yoshihiro Yamanishi and Yoshizawa, {Akiyasu C.} and Minoru Kanehisa and Susumu Goto",
year = "2013",
month = "1",
day = "1",
doi = "10.1093/nar/gks1239",
language = "English",
volume = "41",
pages = "D353--D357",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "D1",

}

TY - JOUR

T1 - KEGG OC

T2 - A large-scale automatic construction of taxonomy-based ortholog clusters

AU - Nakaya, Akihiro

AU - Katayama, Toshiaki

AU - Itoh, Masumi

AU - Hiranuka, Kazushi

AU - Kawashima, Shuichi

AU - Moriya, Yuki

AU - Okuda, Shujiro

AU - Tanaka, Michihiro

AU - Tokimatsu, Toshiaki

AU - Yamanishi, Yoshihiro

AU - Yoshizawa, Akiyasu C.

AU - Kanehisa, Minoru

AU - Goto, Susumu

PY - 2013/1/1

Y1 - 2013/1/1

N2 - The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.

AB - The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.

UR - http://www.scopus.com/inward/record.url?scp=84876517441&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876517441&partnerID=8YFLogxK

U2 - 10.1093/nar/gks1239

DO - 10.1093/nar/gks1239

M3 - Article

C2 - 23193276

AN - SCOPUS:84876517441

VL - 41

SP - D353-D357

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - D1

ER -