Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments

Nucleic Acids Res. 2010 Jun;38(10):e112. doi: 10.1093/nar/gkq041. Epub 2010 Feb 11.

Abstract

Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Alternative Splicing*
  • Cell Line
  • Computer Simulation
  • Exons
  • Expressed Sequence Tags
  • Gene Expression Profiling*
  • Humans
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Protein Isoforms / genetics*
  • Protein Isoforms / metabolism
  • Sequence Analysis, RNA*

Substances

  • Protein Isoforms

Associated data

  • GEO/GSE13474