Development and validation of a spontaneous preterm birth risk prediction algorithm based on maternal bioinformatics: A single-center retrospective study

Yu Chen; Xinyan Shi; Zhiyi Wang; Lin Zhang

doi:10.1186/s12884-024-06933-x

Development and validation of a spontaneous preterm birth risk prediction algorithm based on maternal bioinformatics: A single-center retrospective study

BMC Pregnancy Childbirth. 2024 Nov 18;24(1):763. doi: 10.1186/s12884-024-06933-x.

Authors

Yu Chen^#^{1

2}, Xinyan Shi^#³, Zhiyi Wang³, Lin Zhang⁴

Affiliations

¹ School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, 310053, China. [email protected].
² Department of Clinical Laboratory, Hangzhou Women's Hospital, No. 369, Kunpeng Road, Shangcheng District Hangzhou, Hangzhou, 310008, Zhejiang, China. [email protected].
³ Department of Clinical Laboratory, Hangzhou Women's Hospital, No. 369, Kunpeng Road, Shangcheng District Hangzhou, Hangzhou, 310008, Zhejiang, China.
⁴ Department of Obstetrics, Hangzhou Women's Hospital, Hangzhou, Zhejiang, 310008, China.

^# Contributed equally.

PMID: 39558279
DOI: 10.1186/s12884-024-06933-x

Abstract

Background: Spontaneous preterm birth (sPTB) is a primary cause of adverse neonatal outcomes. The objective of this study is to analyze the factors influencing the occurrence of sPTB in pregnant women and to construct and validate a predictive model for sPTB risk based on big data from clinical and laboratory assessments during pregnancy.

Methods: A retrospective analysis was conducted on the clinical data of 3,082 pregnant women, categorizing those who delivered before 37 weeks of gestation as the sPTB group and those who delivered at or after 37 weeks as the full-term group. The performance of five machine learning models was compared using metrics such as the AUC, accuracy, sensitivity, specificity, and precision to identify the optimal predictive model. The top 10 predictive variables were selected based on their significance in disease prediction. The data were then divided into a training set (70%) and a validation set (30%) for validation. External data were also utilized to validate the model's predictive performance.

Results: A total of 24 indicators with significant differences were identified. In terms of predicting the risk of preterm birth, the XGBoost algorithm demonstrated the most outstanding performance, with an AUC_ROC of 0.89 (95% CI: 0.88-0.90). The top 10 critical indicators included ALP, AFP, ALB, HCT, TC, DBP, ALT, PLT, height, and SBP, which are essential for constructing an accurate predictive model. The model exhibited stable performance on both the training and validation sets, with AUC values of 0.93 and 0.87, respectively. Furthermore, the external testing set also showed superior performance, with an AUC of 0.79.

Conclusions: At the time of delivery, ALP, AFP, ALB, HCT, TC, DBP, ALT, PLT, height, and SBP are influential factors for sPTB in pregnant women. The XGBoost algorithm, constructed based on these factors, demonstrated the most outstanding performance.

Keywords: Big data; Clinical laboratory; Machine learning; Model; Spontaneous preterm birth.