Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation

Zahra Bashir; Manxi Lin; Aasa Feragen; Kamil Mikolaj; Caroline Taksøe-Vester; Anders Nymark Christensen; Morten B S Svendsen; Mette Hvilshøj Fabricius; Lisbeth Andreasen; Mads Nielsen; Martin Grønnebæk Tolsgaard

doi:10.1038/s41598-025-86536-4

Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation

Sci Rep. 2025 Jan 15;15(1):2074. doi: 10.1038/s41598-025-86536-4.

Authors

Zahra Bashir^{1

2

3}, Manxi Lin⁴, Aasa Feragen⁴, Kamil Mikolaj⁴, Caroline Taksøe-Vester^{5

6

7}, Anders Nymark Christensen⁴, Morten B S Svendsen⁶, Mette Hvilshøj Fabricius⁸, Lisbeth Andreasen⁹, Mads Nielsen¹⁰, Martin Grønnebæk Tolsgaard^{5

6

7}

Affiliations

¹ Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark. [email protected].
² Department of Obstetrics and Gynecology, Slagelse Hospital, Fælledvej 11, 4200, Slagelse, Denmark. [email protected].
³ Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Denmark. [email protected].
⁴ Technical University of Denmark (DTU), Lyngby, Denmark.
⁵ Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
⁶ Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Denmark.
⁷ Center of Fetal Medicine, Dept. of Obstetrics, Copenhagen University Hospital, Rigshospitalet, Denmark.
⁸ Department of Obstetrics and Gynecology, Slagelse Hospital, Fælledvej 11, 4200, Slagelse, Denmark.
⁹ Department of Obstetrics and Gynecology, Hvidovre Hospital, Hvidovre, Denmark.
¹⁰ Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.

PMID: 39820804
DOI: 10.1038/s41598-025-86536-4

Abstract

We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans using both retrospective and prospective data. We used a modified Progressive Concept Bottleneck Model with pre-established clinical concepts as explanations (feedback on image optimization and presence of anatomical landmarks) as well as segmentations (outlining anatomical landmarks). The model was evaluated prospectively by assessing the following: the model's ability to assess standard plane quality, the correctness of explanations, the clinical usefulness of explanations, and the model's ability to discriminate between different levels of expertise among clinicians. We used 9352 annotated images for model development and 100 videos for prospective evaluation. Overall classification accuracy was 96.3%. The model's performance in assessing standard plane quality was on par with that of clinicians. Agreement between model segmentations and explanations provided by expert clinicians was found in 83.3% and 74.2% of cases, respectively. A panel of clinicians evaluated segmentations as useful in 72.4% of cases and explanations as useful in 75.0% of cases. Finally, the model reliably discriminated between the performances of clinicians with different levels of experience (p- values < 0.01 for all measures) Our study has successfully developed an Explainable AI model for real-time feedback to clinicians performing fetal growth scans. This work contributes to the existing literature by addressing the gap in the clinical validation of Explainable AI models within fetal medicine, emphasizing the importance of multi-level, cross-institutional, and prospective evaluation with clinician end-users. The prospective clinical validation uncovered challenges and opportunities that could not have been anticipated if we had only focused on retrospective development and validation, such as leveraging AI to gauge operator competence in fetal ultrasound.

Keywords: Artificial intelligence, Fetal growth scans, Explainable AI, Human-AI collaboration.

Publication types

Validation Study
Multicenter Study

MeSH terms

Artificial Intelligence*
Deep Learning
Female
Fetal Development*
Humans
Pregnancy
Prospective Studies
Retrospective Studies
Ultrasonography, Prenatal* / methods

Grants and funding

A1152/The Local Research Fund for Næstved, Slagelse, and Ringsted Hospitals,