Joint modeling of an outcome variable and integrated omics datasets using GLM-PO2PLS

J Appl Stat. 2024 Feb 21;51(13):2627-2651. doi: 10.1080/02664763.2024.2313458. eCollection 2024.

Abstract

In many studies of human diseases, multiple omics datasets are measured. Typically, these omics datasets are studied one by one with the disease, thus the relationship between omics is overlooked. Modeling the joint part of multiple omics and its association to the outcome disease will provide insights into the complex molecular base of the disease. Several dimension reduction methods which jointly model multiple omics and two-stage approaches that model the omics and outcome in separate steps are available. Holistic one-stage models for both omics and outcome are lacking. In this article, we propose a novel one-stage method that jointly models an outcome variable with omics. We establish the model identifiability and develop EM algorithms to obtain maximum likelihood estimators of the parameters for normally and Bernoulli distributed outcomes. Test statistics are proposed to infer the association between the outcome and omics, and their asymptotic distributions are derived. Extensive simulation studies are conducted to evaluate the proposed model. The method is illustrated by modeling Down syndrome as outcome and methylation and glycomics as omics datasets. Here we show that our model provides more insight by jointly considering methylation and glycomics.

Keywords: Dimension reduction; PLS methods; data integration; generalized linear models; multiple omics.

Grants and funding

The authors were supported by the following financial support for the research, authorship, and/or publication of this article: Zhujie Gu was supported by the European Union's Horizon 2020 research and innovation programme IMforFUTURE [grant number 721815]; the EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking BigData@Heart [grant number 116074]; and Medical Research Council [programme number MC-UU-00002/5]. Said el Bouhaddani was supported by ERA-Net E-Rare JTC 2018 (MSA-omics) [40-44000-98-2006/ 90030376507].