Background: Spontaneous preterm birth (sPTB) is a primary cause of adverse neonatal outcomes. The objective of this study is to analyze the factors influencing the occurrence of sPTB in pregnant women and to construct and validate a predictive model for sPTB risk based on big data from clinical and laboratory assessments during pregnancy.
Methods: A retrospective analysis was conducted on the clinical data of 3,082 pregnant women, categorizing those who delivered before 37 weeks of gestation as the sPTB group and those who delivered at or after 37 weeks as the full-term group. The performance of five machine learning models was compared using metrics such as the AUC, accuracy, sensitivity, specificity, and precision to identify the optimal predictive model. The top 10 predictive variables were selected based on their significance in disease prediction. The data were then divided into a training set (70%) and a validation set (30%) for validation. External data were also utilized to validate the model's predictive performance.
Results: A total of 24 indicators with significant differences were identified. In terms of predicting the risk of preterm birth, the XGBoost algorithm demonstrated the most outstanding performance, with an AUCROC of 0.89 (95% CI: 0.88-0.90). The top 10 critical indicators included ALP, AFP, ALB, HCT, TC, DBP, ALT, PLT, height, and SBP, which are essential for constructing an accurate predictive model. The model exhibited stable performance on both the training and validation sets, with AUC values of 0.93 and 0.87, respectively. Furthermore, the external testing set also showed superior performance, with an AUC of 0.79.
Conclusions: At the time of delivery, ALP, AFP, ALB, HCT, TC, DBP, ALT, PLT, height, and SBP are influential factors for sPTB in pregnant women. The XGBoost algorithm, constructed based on these factors, demonstrated the most outstanding performance.
Keywords: Big data; Clinical laboratory; Machine learning; Model; Spontaneous preterm birth.
© 2024. The Author(s).