Nonexercise machine learning models for maximal oxygen uptake prediction in national population surveys

J Am Med Inform Assoc. 2023 Apr 19;30(5):943-952. doi: 10.1093/jamia/ocad035.

Abstract

Objective: Nonexercise algorithms are cost-effective methods to estimate cardiorespiratory fitness (CRF), but the existing models have limitations in generalizability and predictive power. This study aims to improve the nonexercise algorithms using machine learning (ML) methods and data from US national population surveys.

Materials and methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES). Maximal oxygen uptake (VO2 max), measured through a submaximal exercise test, served as the gold standard measure for CRF in this study. We applied multiple ML algorithms to build 2 models: a parsimonious model using commonly available interview and examination data, and an extended model additionally incorporating variables from Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests in clinical practice. Key predictors were identified using Shapley additive explanation (SHAP).

Results: Among the 5668 NHANES participants in the study population, 49.9% were women and the mean (SD) age was 32.5 years (10.0). The light gradient boosting machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing nonexercise algorithms that could be applied to the NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73-9.33]) and the extended LightGBM model (RMSE: 8.26 ml/kg/min [95% CI: 7.44-9.09]) significantly reduced the error by 15% and 12% (P < .001 for both), respectively.

Discussion: The integration of ML and national data source presents a novel approach for estimating cardiovascular fitness. This method provides valuable insights for cardiovascular disease risk classification and clinical decision-making, ultimately leading to improved health outcomes.

Conclusion: Our nonexercise models provide improved accuracy in estimating VO2 max within NHANES data as compared to existing nonexercise algorithms.

Keywords: NHANES; cardiorespiratory fitness; machine learning; maximal oxygen uptake.

MeSH terms

  • Adult
  • Exercise Test* / methods
  • Exercise*
  • Female
  • Humans
  • Machine Learning
  • Male
  • Nutrition Surveys
  • Oxygen
  • Young Adult

Substances

  • Oxygen