Objectives: The generation of structured documents for clinical trials is a promising application of large language models (LLMs). We share opportunities, insights, and challenges from a competitive challenge that used LLMs for automating clinical trial documentation.
Materials and methods: As part of a challenge initiated by Pfizer (organizer), several teams (participant) created a pilot for generating summaries of safety tables for clinical study reports (CSRs). Our evaluation framework used automated metrics and expert reviews to assess the quality of AI-generated documents.
Results: The comparative analysis revealed differences in performance across solutions, particularly in factual accuracy and lean writing. Most participants employed prompt engineering with generative pre-trained transformer (GPT) models.
Discussion: We discuss areas for improvement, including better ingestion of tables, addition of context and fine-tuning.
Conclusion: The challenge results demonstrate the potential of LLMs in automating table summarization in CSRs while also revealing the importance of human involvement and continued research to optimize this technology.
Keywords: GPT-3.5; clinical trials; generative artificial intelligence; large language models; natural language processing; regulatory documents; text summarization.
© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.