Pathway analysis for RNA-Seq data using a score-based approach

Biometrics. 2016 Mar;72(1):165-74. doi: 10.1111/biom.12372. Epub 2015 Aug 10.

Abstract

A variety of pathway/gene-set approaches have been proposed to provide evidence of higher-level biological phenomena in the association of expression with experimental condition or clinical outcome. Among these approaches, it has been repeatedly shown that resampling methods are far preferable to approaches that implicitly assume independence of genes. However, few approaches have been optimized for the specific characteristics of RNA-Seq transcription data, in which mapped tags produce discrete counts with varying library sizes, and with potential outliers or skewness patterns that violate parametric assumptions. We describe transformations to RNA-Seq data to improve power for linear associations with outcome and flexibly handle normalization factors. Using these transformations or alternate transformations, we apply recently developed null approximations to quadratic form statistics for both self-contained and competitive pathway testing. The approach provides a convenient integrated platform for RNA-Seq pathway testing. We demonstrate that the approach provides appropriate type I error control without actual permutation and is powerful under many settings in comparison to competing approaches. Pathway analysis of data from a study of F344 vs. HIV1Tg rats, and of sex differences in lymphoblastoid cell lines from humans, strongly supports the biological interpretability of the findings.

Keywords: Linear model; Pathway analysis; RNA-seq; Statistical genetics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Animals
  • Data Interpretation, Statistical
  • Data Mining / methods*
  • Databases, Nucleic Acid*
  • Female
  • High-Throughput Nucleotide Sequencing / methods*
  • Male
  • Protein Interaction Mapping / methods
  • Rats
  • Sequence Analysis, RNA / methods*
  • Sex Factors
  • Signal Transduction / genetics*
  • Species Specificity
  • Transcription Factors / genetics*

Substances

  • Transcription Factors