Comparing performance between clinics of an embryo evaluation algorithm based on time-lapse images and machine learning

Martin N Johansen; Erik T Parner; Mikkel F Kragh; Keiichi Kato; Satoshi Ueno; Stefan Palm; Manuel Kernbach; Başak Balaban; İpek Keleş; Anette V Gabrielsen; Lea H Iversen; Jørgen Berntsen

doi:10.1007/s10815-023-02871-3

Comparing performance between clinics of an embryo evaluation algorithm based on time-lapse images and machine learning

J Assist Reprod Genet. 2023 Sep;40(9):2129-2137. doi: 10.1007/s10815-023-02871-3. Epub 2023 Jul 10.

Authors

Martin N Johansen¹, Erik T Parner², Mikkel F Kragh^{3

4}, Keiichi Kato⁵, Satoshi Ueno⁵, Stefan Palm⁶, Manuel Kernbach⁶, Başak Balaban⁷, İpek Keleş⁸, Anette V Gabrielsen⁹, Lea H Iversen⁹, Jørgen Berntsen³

Affiliations

¹ Vitrolife A/S, Jens Juuls Vej 18-20, 8260, Viby J, Denmark. [email protected].
² Section for Biostatistics, Department of Public Health, Aarhus University, Aarhus, Denmark.
³ Vitrolife A/S, Jens Juuls Vej 18-20, 8260, Viby J, Denmark.
⁴ The AI Lab Aps, Aarhus, Denmark.
⁵ Kato Ladies Clinic, Tokyo, Japan.
⁶ MVZ PAN Institut, Cologne, Germany.
⁷ American Hospital, Istanbul, Turkey.
⁸ Koc University Hospital, Istanbul, Turkey.
⁹ Fertility Clinic, Horsens Regional Hospital, Horsens, Denmark.

Abstract

Purpose: This article aims to assess how differences in maternal age distributions between IVF clinics affect the performance of an artificial intelligence model for embryo viability prediction and proposes a method to account for such differences.

Methods: Using retrospectively collected data from 4805 fresh and frozen single blastocyst transfers of embryos incubated for 5 to 6 days, the discriminative performance was assessed based on fetal heartbeat outcomes. The data was collected from 4 clinics, and the discrimination was measured in terms of the area under ROC curves (AUC) for each clinic. To account for the different age distributions between clinics, a method for age-standardizing the AUCs was developed in which the clinic-specific AUCs were standardized using weights for each embryo according to the relative frequency of the maternal age in the relevant clinic compared to the age distribution in a common reference population.

Results: There was substantial variation in the clinic-specific AUCs with estimates ranging from 0.58 to 0.69 before standardization. The age-standardization of the AUCs reduced the between-clinic variance by 16%. Most notably, three of the clinics had quite similar AUCs after standardization, while the last clinic had a markedly lower AUC both with and without standardization.

Conclusion: The method of using age-standardization of the AUCs that is proposed in this article mitigates some of the variability between clinics. This enables a comparison of clinic-specific AUCs where the difference in age distributions is accounted for.

Keywords: Artificial intelligence; Embryo selection; Model performance; Time-lapse.

MeSH terms

Artificial Intelligence*
Blastocyst*
Fertilization in Vitro
Humans
Machine Learning
Retrospective Studies
Time-Lapse Imaging