Background: Early prediction of preeclampsia is challenging because of poorly understood causes, various risk factors, and likely multiple pathogenic phenotypes of preeclampsia. Statistical learning methods are well-equipped to deal with a large number of variables, such as patients' clinical and laboratory data, and to select the most informative features automatically.
Objective: Our objective was to use statistical learning methods to analyze all available clinical and laboratory data that were obtained during routine prenatal visits in early pregnancy and to use them to develop a prediction model for preeclampsia.
Study design: This was a retrospective cohort study that used data from 16,370 births at Lucile Packard Children Hospital at Stanford, CA, from April 2014 to January 2018. Two statistical learning algorithms were used to build a predictive model: (1) elastic net and (2) gradient boosting algorithm. Models for all preeclampsia and early-onset preeclampsia (<34 weeks gestation) were fitted with the use of patient data that were available at <16 weeks gestational age. The 67 variables that were considered in the models included maternal characteristics, medical history, routine prenatal laboratory results, and medication intake. The area under the receiver operator curve, true-positive rate, and false-positive rate were assessed via cross-validation.
Results: Using the elastic net algorithm, we developed a prediction model that contained a subset of the most informative features from all variables. The obtained prediction model for preeclampsia yielded an area under the curve of 0.79 (95% confidence interval, 0.75-0.83), sensitivity of 45.2%, and false-positive rate of 8.1%. The prediction model for early-onset preeclampsia achieved an area under the curve of 0.89 (95% confidence interval, 0.84-0.95), true-positive rate of 72.3%, and false-positive rate of 8.8%.
Conclusion: Statistical learning methods in a retrospective cohort study automatically identified a set of significant features for prediction and yielded high prediction performance for preeclampsia risk from routine early pregnancy information.
Keywords: early prediction of preeclampsia; elastic net; gradient boosting algorithm; machine learning; preeclampsia; statistical learning.
Copyright © 2020 Elsevier Inc. All rights reserved.