ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel

Biostatistics. 2023 Apr 14;24(2):388-405. doi: 10.1093/biostatistics/kxab013.

Abstract

The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.

Keywords: Gene class; Isoform splicing; LDA; Variational Bayes.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem*
  • Humans
  • Protein Isoforms / analysis
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • Sequence Analysis, RNA / methods

Substances

  • Protein Isoforms