Background: Various disease prediction models have been developed, capitalizing on the wide use of electronic health records, but environmental factors that are important in the development of noncommunicable diseases are rarely included in the prediction models. Hypertensive disorders of pregnancy are leading causes of maternal morbidity and mortality and are known to cause several serious complications later in life.
Objective: This study aims to develop early hypertensive disorders of pregnancy prediction models using comprehensive environmental factors based on self-report questionnaires in early pregnancy.
Study design: We developed machine learning and artificial intelligence models for the early prediction of hypertensive disorders of pregnancy using early pregnancy data from approximately 23,000 pregnancies in the Tohoku Medical Megabank Birth and Three Generation Cohort Study. We clarified the important features for prediction based on regression coefficients or Gini coefficients of the interpretable artificial intelligence models (i.e., logistic regression, random forest and XGBoost models) among our developed models.
Results: The performance of the early hypertensive disorders of pregnancy prediction models reached an area under the receiver operating characteristic curve of 0.93, demonstrating that the early hypertensive disorders of pregnancy prediction models developed in this study retain sufficient performance in hypertensive disorders of pregnancy prediction. Among the early prediction models, the best performing model was based on self-reported questionnaire data in early pregnancy (mean of 20.2 gestational weeks at filling) which consist of comprehensive lifestyles. The interpretation of the models reveals that both eating habits were dominantly important for prediction.
Conclusion: We have developed high-performance models for early hypertensive disorders of pregnancy prediction using large-scale cohort data from the Tohoku Medical Megabank project. Our study clearly revealed that the use of comprehensive lifestyles from self-report questionnaires led us to predict hypertensive disorders of pregnancy risk at the early stages of pregnancy, which will aid early intervention to reduce the risk of hypertensive disorders of pregnancy.
Keywords: disease prediction; hypertensive disorders of pregnancy; lifestyle; machine learning; obstetrics.
© 2024 The Authors.