High-sensitivity pattern discovery in large, paired multiomic datasets

Andrew R Ghazi; Kathleen Sucipto; Ali Rahnavard; Eric A Franzosa; Lauren J McIver; Jason Lloyd-Price; Emma Schwager; George Weingart; Yo Sup Moon; Xochitl C Morgan; Levi Waldron; Curtis Huttenhower

doi:10.1093/bioinformatics/btac232

High-sensitivity pattern discovery in large, paired multiomic datasets

Bioinformatics. 2022 Jun 24;38(Suppl 1):i378-i385. doi: 10.1093/bioinformatics/btac232.

Authors

Andrew R Ghazi^{1

2

3}, Kathleen Sucipto¹, Ali Rahnavard^{1

2}, Eric A Franzosa^{1

2

3}, Lauren J McIver^{1

2

3}, Jason Lloyd-Price^{1

2}, Emma Schwager¹, George Weingart^{1

3}, Yo Sup Moon¹, Xochitl C Morgan⁴, Levi Waldron⁵, Curtis Huttenhower^{1

2

3

6}

Affiliations

¹ Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.
² Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
³ Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.
⁴ Department of Microbiology and Immunology, University of Otago, Dunedin 9016, New Zealand.
⁵ Department of Epidemiology and Biostatistics, City University of New York Graduate School of Public Health and Health Policy, New York City, NY 10035, USA.
⁶ Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.

Abstract

Motivation: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control.

Results: Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes.

Availability and implementation: An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group.

Supplementary information: Supplementary data are available at Bioinformatics online.

High-sensitivity pattern discovery in large, paired multiomic datasets

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding