Medical large language models are vulnerable to data-poisoning attacks

Daniel Alexander Alber; Zihao Yang; Anton Alyakin; Eunice Yang; Sumedha Rai; Aly A Valliani; Jeff Zhang; Gabriel R Rosenbaum; Ashley K Amend-Thomas; David B Kurland; Caroline M Kremer; Alexander Eremiev; Bruck Negash; Daniel D Wiggan; Michelle A Nakatsuka; Karl L Sangwon; Sean N Neifert; Hammad A Khan; Akshay Vinod Save; Adhith Palla; Eric A Grin; Monika Hedman; Mustafa Nasir-Moin; Xujin Chris Liu; Lavender Yao Jiang; Michal A Mankowski; Dorry L Segev; Yindalon Aphinyanaphongs; Howard A Riina; John G Golfinos; Daniel A Orringer; Douglas Kondziolka; Eric Karl Oermann

doi:10.1038/s41591-024-03445-1

Medical large language models are vulnerable to data-poisoning attacks

Nat Med. 2025 Jan 8. doi: 10.1038/s41591-024-03445-1. Online ahead of print.

Authors

Daniel Alexander Alber^{1

2}, Zihao Yang^{3

4}, Anton Alyakin^{3

5}, Eunice Yang^{3

6}, Sumedha Rai^{3

4}, Aly A Valliani³, Jeff Zhang^{3

7

8}, Gabriel R Rosenbaum³, Ashley K Amend-Thomas³, David B Kurland³, Caroline M Kremer^{3

9}, Alexander Eremiev^{3

9}, Bruck Negash^{3

9}, Daniel D Wiggan^{3

9}, Michelle A Nakatsuka^{3

9}, Karl L Sangwon^{3

9}, Sean N Neifert³, Hammad A Khan³, Akshay Vinod Save³, Adhith Palla^{3

9}, Eric A Grin^{3

9}, Monika Hedman³, Mustafa Nasir-Moin^{3

10}, Xujin Chris Liu^{3

11}, Lavender Yao Jiang^{3

4}, Michal A Mankowski¹², Dorry L Segev^{7

12}, Yindalon Aphinyanaphongs^{7

8}, Howard A Riina^{3

13}, John G Golfinos^{3

14}, Daniel A Orringer^{3

15}, Douglas Kondziolka^{3

16}, Eric Karl Oermann^{3

4

13

17}

Affiliations

¹ Department of Neurosurgery, NYU Langone Health, New York, NY, USA. [email protected].
² New York University Grossman School of Medicine, New York, NY, USA. [email protected].
³ Department of Neurosurgery, NYU Langone Health, New York, NY, USA.
⁴ Center for Data Science, New York University, New York, NY, USA.
⁵ Washington University School of Medicine, Saint Louis, MO, USA.
⁶ Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA.
⁷ Department of Population Health, NYU Langone Health, New York, NY, USA.
⁸ Division of Applied AI Technologies, MCIT Department of Health Informatics, NYU Langone Health, New York, NY, USA.
⁹ New York University Grossman School of Medicine, New York, NY, USA.
¹⁰ Harvard Medical School, Boston, MA, USA.
¹¹ Electrical and Computer Engineering, Tandon School of Engineering, New York, NY, USA.
¹² Department of Surgery, NYU Langone Health, New York, NY, USA.
¹³ Department of Radiology, NYU Langone Health, New York, NY, USA.
¹⁴ Department of Otolaryngology-Head and Neck Surgery, NYU Langone Health, New York, NY, USA.
¹⁵ Department of Pathology, NYU Langone Health, New York, NY, USA.
¹⁶ Department of Radiation Oncology, NYU Langone Health, New York, NY, USA.
¹⁷ Neuroscience Institute, NYU Langone Health, New York, NY, USA.

PMID: 39779928
DOI: 10.1038/s41591-024-03445-1

Abstract

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety.

Grants and funding

3P30CA016087-41S1/U.S. Department of Health & Human Services | NIH | National Cancer Institute (NCI)