Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories

Biometrics. 2024 Oct 3;80(4):ujae131. doi: 10.1093/biomtc/ujae131.

Abstract

The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).

Keywords: Monte Carlo expectation maximization; dependent Gaussian processes; high-dimensional biomarker expression trajectories; multivariate longitudinal data; pathways; sparse factor analysis.

MeSH terms

  • Bayes Theorem*
  • Biometry / methods
  • Computer Simulation*
  • Factor Analysis, Statistical
  • Gene Expression Profiling* / methods
  • Gene Expression Profiling* / statistics & numerical data
  • Humans
  • Markov Chains
  • Models, Statistical
  • Monte Carlo Method*
  • Normal Distribution