A multi-source data integration approach reveals novel associations between metabolites and renal outcomes in the German Chronic Kidney Disease study

Michael Altenbuchinger; Helena U Zacharias; Stefan Solbrig; Andreas Schäfer; Mustafa Büyüközkan; Ulla T Schultheiß; Fruzsina Kotsis; Anna Köttgen; Rainer Spang; Peter J Oefner; Jan Krumsiek; Wolfram Gronwald

doi:10.1038/s41598-019-50346-2

A multi-source data integration approach reveals novel associations between metabolites and renal outcomes in the German Chronic Kidney Disease study

Sci Rep. 2019 Sep 27;9(1):13954. doi: 10.1038/s41598-019-50346-2.

Authors

Affiliations

¹ Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Am Biopark 9, 93053, Regensburg, Germany.
² Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
³ Department of Physics, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany.
⁴ Institute of Genetic Epidemiology, Department of Biometry, Epidemiology, and Medical Bioinformatics, Faculty of Medicine and Medical Center, University of Freiburg, 79106, Freiburg, Germany.
⁵ Department of Nephrology, Medical Center, University of Freiburg, 79106, Freiburg, Germany.
⁶ Institute for Functional Genomics, University of Regensburg, Am Biopark 9, 93053, Regensburg, Germany.
⁷ Institute of Computational Biomedicine, Weill Cornell University, New York, NY, 10021, USA.
⁸ Institute for Functional Genomics, University of Regensburg, Am Biopark 9, 93053, Regensburg, Germany. [email protected].

Abstract

Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To this end, omics data are integrated with other data types, e.g., clinical, phenotypic, and demographic parameters of categorical or continuous nature. We exemplify this data integration issue for a chronic kidney disease (CKD) study, comprising complex clinical, demographic, and one-dimensional ¹H nuclear magnetic resonance metabolic variables. Routine analysis screens for associations of single metabolic features with clinical parameters while accounting for confounders typically chosen by expert knowledge. This knowledge can be incomplete or unavailable. We introduce a framework for data integration that intrinsically adjusts for confounding variables. We give its mathematical and algorithmic foundation, provide a state-of-the-art implementation, and evaluate its performance by sanity checks and predictive performance assessment on independent test data. Particularly, we show that discovered associations remain significant after variable adjustment based on expert knowledge. In contrast, we illustrate that associations discovered in routine univariate screening approaches can be biased by incorrect or incomplete expert knowledge. Our data integration approach reveals important associations between CKD comorbidities and metabolites, including novel associations of the plasma metabolite trimethylamine-N-oxide with cardiac arrhythmia and infarction in CKD stage 3 patients.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Biomarkers / blood
Female
Germany
Humans
Kidney / metabolism*
Magnetic Resonance Spectroscopy
Male
Metabolomics*
Models, Theoretical
Prognosis
Renal Insufficiency, Chronic / metabolism*

Substances

Biomarkers