Reliable predictors of long-term all-cause mortality are needed for middle-aged and older populations. Previous metabolomics mortality studies have limitations: a low number of participants and metabolites measured, measurements mainly using nuclear magnetic spectroscopy, and the use only of conventional statistical methods. To overcome these challenges, we applied liquid chromatography-tandem mass spectrometry and measured >1000 metabolites in the METSIM study including 10,197 men. We applied the machine learning approach together with conventional statistical methods to identify metabolites associated with all-cause mortality. The three independent machine learning methods (logistic regression, XGBoost, and Welch's t-test) identified 32 metabolites having the most impactful associations with all-cause mortality (25 increasing and 7 decreasing the risk). From these metabolites, 20 were novel and encompassed various metabolic pathways, impacting the cardiovascular, renal, respiratory, endocrine, and central nervous systems. In the Cox regression analyses (hazard ratios and their 95% confidence intervals), clinical and laboratory risk factors increased the risk of all-cause mortality by 1.76 (1.60-1.94), the 25 metabolites by 1.89 (1.68-2.12), and clinical and laboratory risk factors combined with the 25 metabolites by 2.00 (1.81-2.22). In our study, the main causes of death were cancers (28%) and cardiovascular diseases (25%). We did not identify any metabolites associated with cancer but found 13 metabolites associated with an increased risk of cardiovascular diseases. Our study reports several novel metabolites associated with an increased risk of mortality and shows that these 25 metabolites improved the prediction of all-cause mortality beyond and above clinical and laboratory measurements.
Keywords: aging; artificial intelligence; metabolism; metabolites; mortality.