Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data

Chem Res Toxicol. 2014 Sep 15;27(9):1528-36. doi: 10.1021/tx500148n. Epub 2014 Aug 14.

Abstract

Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cluster Analysis
  • Oligonucleotide Array Sequence Analysis
  • Peroxisome Proliferator-Activated Receptors / genetics
  • Peroxisome Proliferator-Activated Receptors / metabolism
  • RNA / chemistry*
  • Receptors, Estrogen / genetics
  • Receptors, Estrogen / metabolism
  • Sequence Analysis, RNA
  • Signal Transduction
  • Toxicogenetics*
  • Transcriptome

Substances

  • Peroxisome Proliferator-Activated Receptors
  • Receptors, Estrogen
  • RNA