Using large language models for safety-related table summarization in clinical study reports

JAMIA Open. 2024 May 29;7(2):ooae043. doi: 10.1093/jamiaopen/ooae043. eCollection 2024 Jul.

Abstract

Objectives: The generation of structured documents for clinical trials is a promising application of large language models (LLMs). We share opportunities, insights, and challenges from a competitive challenge that used LLMs for automating clinical trial documentation.

Materials and methods: As part of a challenge initiated by Pfizer (organizer), several teams (participant) created a pilot for generating summaries of safety tables for clinical study reports (CSRs). Our evaluation framework used automated metrics and expert reviews to assess the quality of AI-generated documents.

Results: The comparative analysis revealed differences in performance across solutions, particularly in factual accuracy and lean writing. Most participants employed prompt engineering with generative pre-trained transformer (GPT) models.

Discussion: We discuss areas for improvement, including better ingestion of tables, addition of context and fine-tuning.

Conclusion: The challenge results demonstrate the potential of LLMs in automating table summarization in CSRs while also revealing the importance of human involvement and continued research to optimize this technology.

Keywords: GPT-3.5; clinical trials; generative artificial intelligence; large language models; natural language processing; regulatory documents; text summarization.

Publication types

  • Case Reports