Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes

Mikita Suyama, Eoghan Harrington, Peer Bork, David Torrents

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes.

Original languageEnglish
Pages (from-to)627-636
Number of pages10
JournalPLoS Computational Biology
Volume2
Issue number6
DOIs
Publication statusPublished - Jul 10 2006

Fingerprint

Pseudogenes
pseudogenes
Human Genome
Mouse
Genome
genome
Genes
Gene
gene
mice
genes
Duplicate Genes
Molecular Sequence Annotation
duplicate genes
Annotation
Gene Duplication
gene duplication
Human
analysis
Duplication

All Science Journal Classification (ASJC) codes

  • Cellular and Molecular Neuroscience
  • Ecology
  • Molecular Biology
  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Modelling and Simulation
  • Computational Theory and Mathematics

Cite this

Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes. / Suyama, Mikita; Harrington, Eoghan; Bork, Peer; Torrents, David.

In: PLoS Computational Biology, Vol. 2, No. 6, 10.07.2006, p. 627-636.

Research output: Contribution to journalArticle

@article{ce517d7908d847b0b482611305922b69,
title = "Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes",
abstract = "The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes.",
author = "Mikita Suyama and Eoghan Harrington and Peer Bork and David Torrents",
year = "2006",
month = "7",
day = "10",
doi = "10.1371/journal.pcbi.0020076",
language = "English",
volume = "2",
pages = "627--636",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes

AU - Suyama, Mikita

AU - Harrington, Eoghan

AU - Bork, Peer

AU - Torrents, David

PY - 2006/7/10

Y1 - 2006/7/10

N2 - The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes.

AB - The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes.

UR - http://www.scopus.com/inward/record.url?scp=33745613526&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745613526&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.0020076

DO - 10.1371/journal.pcbi.0020076

M3 - Article

C2 - 16846249

AN - SCOPUS:33745613526

VL - 2

SP - 627

EP - 636

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 6

ER -