Alternative splicing of RNA triplets is often regulated and accelerates proteome evolution

PLoS Biol. 2012 Jan;10(1):e1001229. doi: 10.1371/journal.pbio.1001229. Epub 2012 Jan 3.

Abstract

Thousands of human genes contain introns ending in NAGNAG (N any nucleotide), where both NAGs can function as 3' splice sites, yielding isoforms that differ by inclusion/exclusion of three bases. However, few models exist for how such splicing might be regulated, and some studies have concluded that NAGNAG splicing is purely stochastic and nonfunctional. Here, we used deep RNA-Seq data from 16 human and eight mouse tissues to analyze the regulation and evolution of NAGNAG splicing. Using both biological and technical replicates to estimate false discovery rates, we estimate that at least 25% of alternatively spliced NAGNAGs undergo tissue-specific regulation in mammals, and alternative splicing of strongly tissue-specific NAGNAGs was 10 times as likely to be conserved between species as was splicing of non-tissue-specific events, implying selective maintenance. Preferential use of the distal NAG was associated with distinct sequence features, including a more distal location of the branch point and presence of a pyrimidine immediately before the first NAG, and alteration of these features in a splicing reporter shifted splicing away from the distal site. Strikingly, alignments of orthologous exons revealed a ∼15-fold increase in the frequency of three base pair gaps at 3' splice sites relative to nearby exon positions in both mammals and in Drosophila. Alternative splicing of NAGNAGs in human was associated with dramatically increased frequency of exon length changes at orthologous exon boundaries in rodents, and a model involving point mutations that create, destroy, or alter NAGNAGs can explain both the increased frequency and biased codon composition of gained/lost sequence observed at the beginnings of exons. This study shows that NAGNAG alternative splicing generates widespread differences between the proteomes of mammalian tissues, and suggests that the evolutionary trajectories of mammalian proteins are strongly biased by the locations and phases of the introns that interrupt coding sequences.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Evolution, Molecular*
  • Exons / genetics
  • Female
  • Gene Expression Profiling
  • HEK293 Cells
  • Humans
  • Introns / genetics
  • Male
  • Mice
  • Models, Genetic
  • Molecular Sequence Data
  • Nerve Tissue Proteins / genetics
  • Oligonucleotide Array Sequence Analysis
  • Polypyrimidine Tract-Binding Protein / genetics
  • Protein Isoforms / genetics
  • Proteome / genetics*
  • RNA Splice Sites / genetics*
  • RNA Splicing*
  • Reverse Transcriptase Polymerase Chain Reaction
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid

Substances

  • Nerve Tissue Proteins
  • PTBP2 protein, human
  • Protein Isoforms
  • Proteome
  • Ptbp2 protein, mouse
  • RNA Splice Sites
  • Polypyrimidine Tract-Binding Protein

Associated data

  • GEO/GSE30017