Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context

Bioinformatics. 2011 Sep 15;27(18):2554-62. doi: 10.1093/bioinformatics/btr444. Epub 2011 Jul 29.

Abstract

Motivation: Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a 'splicing code' that predicts how splicing is regulated in different cell types by features derived from RNA, DNA and epigenetic modifiers.

Methods: We formulate the assembly of a splicing code as a problem of statistical inference and introduce a Bayesian method that uses an adaptively selected number of hidden variables to combine subgroups of features into a network, allows different tissues to share feature subgroups and uses a Gibbs sampler to hedge predictions and ascertain the statistical significance of identified features.

Results: Using data for 3665 cassette exons, 1014 RNA features and 4 tissue types derived from 27 mouse tissues (http://genes.toronto.edu/wasp), we benchmarked several methods. Our method outperforms all others, and achieves relative improvements of 52% in splicing code quality and up to 22% in classification error, compared with the state of the art. Novel combinations of regulatory features and novel combinations of tissues that share feature subgroups were identified using our method.

Contact: [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alternative Splicing / genetics*
  • Animals
  • Base Sequence
  • Bayes Theorem
  • Exons
  • Gene Expression
  • Gene Expression Regulation
  • Humans
  • Mice
  • Models, Genetic
  • RNA / genetics*
  • RNA Isoforms / genetics*
  • RNA Splicing
  • Transcription, Genetic

Substances

  • RNA Isoforms
  • RNA