HIP: a method for high-dimensional multi-view data integration and prediction accounting for subgroup heterogeneity

Brief Bioinform. 2024 Sep 23;25(6):bbae470. doi: 10.1093/bib/bbae470.

Abstract

Epidemiologic and genetic studies in many complex diseases suggest subgroup disparities (e.g. by sex, race) in disease course and patient outcomes. We consider this from the standpoint of integrative analysis where we combine information from different views (e.g. genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. We propose Heterogeneity in Integration and Prediction (HIP), a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to a subgroup. We apply HIP to proteomics and gene expression data pertaining to chronic obstructive pulmonary disease (COPD) to identify proteins and genes shared by, and unique to, males and females, contributing to the variation in COPD, measured by airway wall thickness. Our COPD findings have identified proteins, genes, and pathways that are common across and specific to males and females, some implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms. HIP accounts for subgroup heterogeneity in multi-view data, ranks variables based on importance, is applicable to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. With the efficient algorithms implemented using PyTorch, this method has many potential scientific applications and could enhance multiomics research in health disparities. HIP is available at https://github.com/lasandrall/HIP, a video tutorial at https://youtu.be/O6E2OLmeMDo and a Shiny Application at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ for users with limited programming experience.

Keywords: COPD; multi-view data; multi-view learning; one-step methods; subgroup heterogeneity.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Female
  • Genomics / methods
  • Humans
  • Male
  • Proteomics / methods
  • Pulmonary Disease, Chronic Obstructive* / genetics