A probabilistic model of 3' end formation in Caenorhabditis elegans

Nucleic Acids Res. 2004 Jun 24;32(11):3392-9. doi: 10.1093/nar/gkh656. Print 2004.

Abstract

The 3' ends of mRNAs terminate with a poly(A) tail. This post-transcriptional modification is directed by sequence features present in the 3'-untranslated region (3'-UTR). We have undertaken a computational analysis of 3' end formation in Caenorhabditis elegans. By aligning cDNAs that diverge from genomic sequence at the poly(A) tract, we accurately identified a large set of true cleavage sites. When there are many transcripts aligned to a particular locus, local variation of the cleavage site over a span of a few bases is frequently observed. We find that in addition to the well-known AAUAAA motif there are several regions with distinct nucleotide compositional biases. We propose a generalized hidden Markov model that describes sequence features in C.elegans 3'-UTRs. We find that a computer program employing this model accurately predicts experimentally observed 3' ends even when there are multiple AAUAAA motifs and multiple cleavage sites. We have made available a complete set of polyadenylation site predictions for the C.elegans genome, including a subset of 6570 supported by aligned transcripts.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions / chemistry*
  • 3' Untranslated Regions / metabolism*
  • Animals
  • Base Sequence
  • Caenorhabditis elegans / genetics*
  • Caenorhabditis elegans / metabolism
  • Computational Biology
  • Genes, Helminth
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical*
  • Molecular Sequence Data
  • Nucleotides / analysis
  • Probability
  • RNA 3' Polyadenylation Signals
  • RNA Splice Sites
  • RNA Splicing
  • Regulatory Sequences, Ribonucleic Acid
  • Stochastic Processes

Substances

  • 3' Untranslated Regions
  • Nucleotides
  • RNA Splice Sites
  • Regulatory Sequences, Ribonucleic Acid