Improving the Quality of Unstructured Cancer Data Using Large Language Models: A German Oncological Case Study

Stud Health Technol Inform. 2024 Aug 22:316:685-689. doi: 10.3233/SHTI240507.

Abstract

With cancer being a leading cause of death globally, epidemiological and clinical cancer registration is paramount for enhancing oncological care and facilitating scientific research. However, the heterogeneous landscape of medical data presents significant challenges to the current manual process of tumor documentation. This paper explores the potential of Large Language Models (LLMs) for transforming unstructured medical reports into the structured format mandated by the German Basic Oncology Dataset. Our findings indicate that integrating LLMs into existing hospital data management systems or cancer registries can significantly enhance the quality and completeness of cancer data collection - a vital component for diagnosing and treating cancer and improving the effectiveness and benefits of therapies. This work contributes to the broader discussion on the potential of artificial intelligence or LLMs to revolutionize medical data processing and reporting in general and cancer care in particular.

Keywords: Cancer Registry; Data Quality; Information Extraction; Large Language Models; Prompt Engineering.

MeSH terms

  • Artificial Intelligence
  • Data Accuracy
  • Electronic Health Records*
  • Germany
  • Humans
  • Medical Oncology
  • Natural Language Processing*
  • Neoplasms* / therapy
  • Registries