The impact of commercial health datasets on medical research and health-care algorithms

Lancet Digit Health. 2023 May;5(5):e288-e294. doi: 10.1016/S2589-7500(23)00025-0.

Abstract

As the health-care industry emerges into a new era of digital health driven by cloud data storage, distributed computing, and machine learning, health-care data have become a premium commodity with value for private and public entities. Current frameworks of health data collection and distribution, whether from industry, academia, or government institutions, are imperfect and do not allow researchers to leverage the full potential of downstream analytical efforts. In this Health Policy paper, we review the current landscape of commercial health data vendors, with special emphasis on the sources of their data, challenges associated with data reproducibility and generalisability, and ethical considerations for data vending. We argue for sustainable approaches to curating open-source health data to enable global populations to be included in the biomedical research community. However, to fully implement these approaches, key stakeholders should come together to make health-care datasets increasingly accessible, inclusive, and representative, while balancing the privacy and rights of individuals whose data are being collected.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Algorithms*
  • Biomedical Research*
  • Consumer Health Information / economics
  • Consumer Health Information / ethics
  • Datasets as Topic* / economics
  • Datasets as Topic* / ethics
  • Datasets as Topic* / trends
  • Humans
  • Privacy
  • Reproducibility of Results