Machine learning algorithms in constructing prediction models for assisted reproductive technology (ART) related live birth outcomes

Junwei Peng; Xiaoyujie Geng; Yiyue Zhao; Zhijin Hou; Xin Tian; Xinyi Liu; Yuanyuan Xiao; Yang Liu

doi:10.1038/s41598-024-83781-x

Machine learning algorithms in constructing prediction models for assisted reproductive technology (ART) related live birth outcomes

Sci Rep. 2024 Dec 30;14(1):32083. doi: 10.1038/s41598-024-83781-x.

Authors

Junwei Peng^#^{1

2}, Xiaoyujie Geng^#¹, Yiyue Zhao¹, Zhijin Hou¹, Xin Tian², Xinyi Liu², Yuanyuan Xiao³, Yang Liu⁴

Affiliations

¹ Reproductive Medicine Department, Second Affiliated Hospital of Kunming Medical University, Kunming, China.
² Division of Epidemiology and Health Statistics, School of Public Health, Kunming Medical University, Kunming, China.
³ Division of Epidemiology and Health Statistics, School of Public Health, Kunming Medical University, Kunming, China. [email protected].
⁴ Reproductive Medicine Department, Second Affiliated Hospital of Kunming Medical University, Kunming, China. [email protected].

^# Contributed equally.

Abstract

Currently applicable models for predicting live birth outcomes in patients who received assisted reproductive technology (ART) have methodological or study design limitations that greatly obstruct their dissemination and application. Models suitable for Chinese couples have not yet been identified. We conducted a retrospective study by using a database includes a total of 11,938 couples who underwent in vitro fertilization (IVF) treatment between January 2015 and December 2022 in a medical institution of southwest China Yunnan province. Multiple candidate predictors were screened out by using the importance scores. Four machine learning (ML) algorithms including random forest, extreme gradient boosting, light gradient boosting machine and binary logistic regression were used to construct prediction models. An initial assessment of the predictive performance was conducted and validated by using cross-validation and bootstrap methods. A total of seven predictors were identified, namely maternal age, duration of infertility, basal follicle-stimulating hormone (FSH), progressive sperm motility, progesterone (P) on HCG day, estradiol (E2) on HCG day, and luteinizing hormone (LH) on HCG day. Of the four predictive models, the random forest model and the logistic regression model were considered to have the optimal performance, with the areas under the receiver operating characteristic curve (AUROC) curves of 0.671 (95% CI 0.630-0.713) and 0.674 (95% CI 0.627-0.720). The Brier scores were 0.183 (95% CI 0.170-0.196) and 0.183 (95% CI 0.170-0.196), respectively. Considering the simplicity of model fitting, we recommend the logistic regression model as the best predictive model for live birth. Furthermore, maternal age, P on HCG day and E2 on HCG day were deemed to have the highest contribution to model prediction.

Keywords: Clinical prediction model; In vitro fertilization; Infertility; Live birth; Machine learning.

MeSH terms

Adult
Algorithms
China
Female
Fertilization in Vitro / methods
Humans
Live Birth* / epidemiology
Logistic Models
Machine Learning*
Male
Maternal Age
Pregnancy
ROC Curve
Reproductive Techniques, Assisted*
Retrospective Studies

Abstract

MeSH terms

Grants and funding