Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data

Genome Res. 2001 Sep;11(9):1520-6. doi: 10.1101/gr.190501.

Abstract

Alternate polyadenylation affects a large fraction of higher eucaryote mRNAs, producing mature transcripts with 3' ends of variable length. This variation is poorly represented in the current transcript catalogs derived from whole genome sequences, mostly because such posttranscriptional events are not detectable directly at the DNA level. Alternate polyadenylation of an mRNA is better understood by comparison to EST databases. Comparing ESTs to mRNAs, however, is a difficult task subjected to the pitfalls of internal priming, presence of intron sequences, repeated elements, chimerical ESTs or matches with EST from paralogous genes. We present here a computer program that addresses these problems and displays ESTs matches to a query mRNA sequence to predict alternate polyadenylation and to suggest library-specific forms. The output highlights effective polyadenylation signals, possible sources of artifacts such as A-rich stretches in the mRNA sequences, and allows for a direct visualization of EST libraries using color codes. Statistical biases in the distribution of alternative mRNA forms among EST libraries were systematically sought. About 1450 human and 200 mouse mRNAs displayed such biases, suggesting in each case a tissue- or disease-specific regulation of polyadenylation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions / genetics
  • Animals
  • Computational Biology
  • Databases, Factual
  • Expressed Sequence Tags*
  • Gene Library
  • Humans
  • Mice
  • Organ Specificity / genetics*
  • Poly A / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics*

Substances

  • 3' Untranslated Regions
  • Poly A