Purpose: This article aims to assess how differences in maternal age distributions between IVF clinics affect the performance of an artificial intelligence model for embryo viability prediction and proposes a method to account for such differences.
Methods: Using retrospectively collected data from 4805 fresh and frozen single blastocyst transfers of embryos incubated for 5 to 6 days, the discriminative performance was assessed based on fetal heartbeat outcomes. The data was collected from 4 clinics, and the discrimination was measured in terms of the area under ROC curves (AUC) for each clinic. To account for the different age distributions between clinics, a method for age-standardizing the AUCs was developed in which the clinic-specific AUCs were standardized using weights for each embryo according to the relative frequency of the maternal age in the relevant clinic compared to the age distribution in a common reference population.
Results: There was substantial variation in the clinic-specific AUCs with estimates ranging from 0.58 to 0.69 before standardization. The age-standardization of the AUCs reduced the between-clinic variance by 16%. Most notably, three of the clinics had quite similar AUCs after standardization, while the last clinic had a markedly lower AUC both with and without standardization.
Conclusion: The method of using age-standardization of the AUCs that is proposed in this article mitigates some of the variability between clinics. This enables a comparison of clinic-specific AUCs where the difference in age distributions is accounted for.
Keywords: Artificial intelligence; Embryo selection; Model performance; Time-lapse.
© 2023. The Author(s).