The disparate nature of "intergenic" polyadenylation sites

RNA. 2006 Oct;12(10):1794-801. doi: 10.1261/rna.136206. Epub 2006 Aug 24.

Abstract

The termination of mature eukaryotic mRNAs occurs at specific polyadenylation sites located downstream from stop codons in the 3'-untranslated region (UTR). An accurate delineation of these sites is essential for the study of 3'-UTR-based gene regulation and for the design of pertinent probes for transcriptome analysis. Although typical poly(A) sites are located between 0 and 2 kb from the stop codon, EST sequence analyses have identified sites located at unexpectedly long ranges (5-10 kb) in a number of genes. Here we perform a complete mapping of EST and full-length cDNA sequences on the mouse and human genome to observe putative poly(A) sites extending beyond annotated 3'-ends and into the intergenic regions. We introduce several quality parameters for poly(A) site prediction and train a classification tree to associate P-values to predicted sites. We observe a higher than background level of high-scoring sites up to 12-15 kb past the stop codon, both in human and mouse. This leads to an estimate of about 5000 human genes having unreported 3'-end extensions and about 3500 novel polyadenylated transcripts lying in present "intergenic" regions. These high-scoring, long-range poly(A) sites corresponding to novel transcripts and gene extensions should be incorporated into current human and mouse gene repositories.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions / chemistry
  • 3' Untranslated Regions / genetics
  • 3' Untranslated Regions / metabolism
  • Animals
  • Base Sequence
  • Binding Sites / genetics
  • Codon, Terminator / genetics
  • Computational Biology
  • DNA, Intergenic / genetics
  • Expressed Sequence Tags
  • Humans
  • Mice
  • RNA, Messenger / chemistry*
  • RNA, Messenger / genetics*
  • RNA, Messenger / metabolism
  • Transcription, Genetic

Substances

  • 3' Untranslated Regions
  • Codon, Terminator
  • DNA, Intergenic
  • RNA, Messenger