Evidence suggesting that a fifth of annotated Caenorhabditis elegans genes may be pseudogenes

Genome Res. 2002 May;12(5):770-5. doi: 10.1101/gr.208802.

Abstract

Only a minority of the genes, identified in the Caenorhabditis elegans genome sequence data by computer analysis, have been characterized experimentally. We attempted to determine the expression patterns for a random sample of the annotated genes using reporter gene fusions. A low success rate was obtained for evolutionarily recently duplicated genes. Analysis of the data suggests that this is not due to conditional or low-level expression. The remaining explanation is that most of the annotated genes in the recently duplicated category are pseudogenes, a proportion corresponding to 20% of all of the annotated C. elegans genes. Further support for this surprisingly high figure was sought by comparing sequences for families of recently duplicated C. elegans genes. Although only a preliminary analysis, clear evidence for a gene having been recently inactivated by genetic drift was found for many genes in the recently duplicated category. At least 4% of the annotated C. elegans genes can be recognized as pseudogenes simply from closer inspection of the sequence data. Lessons learned in identifying pseudogenes in C. elegans could be of value in the annotation of the genomes of other species where, although there may be fewer pseudogenes, they may be harder to detect.

Publication types

  • Letter
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Artificial Gene Fusion / methods
  • Caenorhabditis elegans / genetics*
  • Caenorhabditis elegans Proteins / genetics
  • Computational Biology / methods*
  • Gene Expression Regulation
  • Genes, Helminth / genetics*
  • Genes, Reporter / genetics
  • Molecular Sequence Data
  • Pseudogenes / genetics*

Substances

  • Caenorhabditis elegans Proteins