Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review

Joseph E Alderman; Maria Charalambides; Gagandeep Sachdeva; Elinor Laws; Joanne Palmer; Elsa Lee; Vaishnavi Menon; Qasim Malik; Sonam Vadera; Melanie Calvert; Marzyeh Ghassemi; Melissa D McCradden; Johan Ordish; Bilal Mateen; Charlotte Summers; Jacqui Gath; Rubeta N Matin; Alastair K Denniston; Xiaoxuan Liu

doi:10.1016/S2589-7500(24)00146-8

Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review

Lancet Digit Health. 2024 Nov;6(11):e827-e847. doi: 10.1016/S2589-7500(24)00146-8.

Authors

Joseph E Alderman¹, Maria Charalambides², Gagandeep Sachdeva³, Elinor Laws¹, Joanne Palmer¹, Elsa Lee⁴, Vaishnavi Menon⁵, Qasim Malik⁶, Sonam Vadera⁷, Melanie Calvert⁸, Marzyeh Ghassemi⁹, Melissa D McCradden¹⁰, Johan Ordish¹¹, Bilal Mateen¹², Charlotte Summers¹³, Jacqui Gath¹⁴, Rubeta N Matin¹⁵, Alastair K Denniston¹⁶, Xiaoxuan Liu¹⁷

Affiliations

¹ Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK; NIHR Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK.
² University Hospital Southampton NHS Foundation Trust, Southampton, UK.
³ The Royal Wolverhampton NHS Trust, Wolverhampton, UK.
⁴ Guy's, King's, & St Thomas' School of Medical Education, King's College London, London, UK.
⁵ Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
⁶ AI Centre for Value Based Healthcare, King's College London, London, UK; Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK.
⁷ Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; University Hospitals of Leicester NHS Trust, Leicester, UK.
⁸ NIHR Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK; Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK; NIHR Applied Research Collaboration (ARC) West Midlands, University of Birmingham, Birmingham, UK; Centre for Patient Reported Outcomes Research, Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK; NIHR Blood and Transplant Research Unit (BTRU) in Precision Transplant and Cellular Therapeutics, Birmingham, UK.
⁹ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
¹⁰ Department of Bioethics, The Hospital for Sick Children, Toronto, ON, Canada; Genetics & Genome Biology, SickKids Research Institute, Toronto, ON, Canada.
¹¹ Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK; Hughes Hall, University of Cambridge, Cambridge, UK; Roche Diagnostics, Rotkreuz, Switzerland.
¹² Institute of Health Informatics, University College London, London, UK; PATH, Seattle, WA, USA; Wellcome Trust, London, UK.
¹³ Victor Phillip Dahdaleh Heart & Lung Research Institute, University of Cambridge, Cambridge, UK.
¹⁴ Independent Cancer Patients Voice, London, UK.
¹⁵ Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
¹⁶ Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK; NIHR Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK; Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK; NIHR Biomedical Research Centre, Moorfields Eye Hospital and University College London, London, UK.
¹⁷ Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK; NIHR Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK. Electronic address: [email protected].

PMID: 39455195
DOI: 10.1016/S2589-7500(24)00146-8

Abstract

During the COVID-19 pandemic, artificial intelligence (AI) models were created to address health-care resource constraints. Previous research shows that health-care datasets often have limitations, leading to biased AI technologies. This systematic review assessed datasets used for AI development during the pandemic, identifying several deficiencies. Datasets were identified by screening articles from MEDLINE and using Google Dataset Search. 192 datasets were analysed for metadata completeness, composition, data accessibility, and ethical considerations. Findings revealed substantial gaps: only 48% of datasets documented individuals' country of origin, 43% reported age, and under 25% included sex, gender, race, or ethnicity. Information on data labelling, ethical review, or consent was frequently missing. Many datasets reused data with inadequate traceability. Notably, historical paediatric chest x-rays appeared in some datasets without acknowledgment. These deficiencies highlight the need for better data quality and transparent documentation to lessen the risk that biased AI models are developed in future health emergencies.

Publication types

Systematic Review

MeSH terms

Artificial Intelligence*
COVID-19* / epidemiology
Datasets as Topic* / standards
Humans
Pandemics