Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories

Jiachen Cai; Robert J B Goudie; Colin Starr; Brian D M Tom

doi:10.1093/biomtc/ujae131

Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories

Biometrics. 2024 Oct 3;80(4):ujae131. doi: 10.1093/biomtc/ujae131.

Authors

Jiachen Cai¹, Robert J B Goudie¹, Colin Starr¹, Brian D M Tom¹

Affiliation

¹ MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom.

PMID: 39552514
DOI: 10.1093/biomtc/ujae131

Abstract

The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).

Keywords: Monte Carlo expectation maximization; dependent Gaussian processes; high-dimensional biomarker expression trajectories; multivariate longitudinal data; pathways; sparse factor analysis.

MeSH terms

Bayes Theorem*
Biometry / methods
Computer Simulation*
Factor Analysis, Statistical
Gene Expression Profiling* / methods
Gene Expression Profiling* / statistics & numerical data
Humans
Markov Chains
Models, Statistical
Monte Carlo Method*
Normal Distribution

Grants and funding

MC_UU_00002/2/MRC_/Medical Research Council/United Kingdom