Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

PLoS Comput Biol. 2015 Jun 25;11(6):e1004310. doi: 10.1371/journal.pcbi.1004310. eCollection 2015 Jun.

Abstract

Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • AIDS Vaccines
  • Antiretroviral Therapy, Highly Active
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Factual
  • Gene Expression Profiling / methods*
  • HIV Infections / drug therapy
  • HIV Infections / prevention & control
  • Humans
  • Influenza Vaccines
  • Influenza, Human / prevention & control
  • Models, Biological*
  • Models, Statistical*

Substances

  • AIDS Vaccines
  • Influenza Vaccines

Grants and funding

This work was supported by the Investissements d’Avenir program managed by the ANR under reference ANR-10-LABX-77 and by the Vaccine Research Institute (VRI), F-94010 Creteil, France. BPH is recipient of a Ph.D. fellowship from the Ecole des Hautes Etudes en Santé Publique (EHESP) Doctoral Network. The DALIA-1 study was supported by a grant from the French National Agency for Research on AIDS and Viral Hepatitis (ANRS: L’Agence Nationale pour la Recherche contre le SIDA et les hépatites virales). The VRI participated in study design, data collection and analysis for the DALIA trial. The funders had no role in decision to publish, or preparation of the manuscript.