Predicting the sample size of randomized controlled trials using natural language processing

Paul Windisch; Fabio Dennstädt; Carole Koechli; Robert Förster; Christina Schröder; Daniel M Aebersold; Daniel R Zwahlen

doi:10.1093/jamiaopen/ooae116

Predicting the sample size of randomized controlled trials using natural language processing

JAMIA Open. 2024 Oct 25;7(4):ooae116. doi: 10.1093/jamiaopen/ooae116. eCollection 2024 Dec.

Authors

Paul Windisch¹, Fabio Dennstädt², Carole Koechli¹, Robert Förster^{1

2}, Christina Schröder¹, Daniel M Aebersold², Daniel R Zwahlen¹

Affiliations

¹ Department of Radiation Oncology, Cantonal Hospital Winterthur, 8400 Winterthur, Switzerland.
² Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, 3012 Bern, Switzerland.

Abstract

Objectives: Extracting the sample size from randomized controlled trials (RCTs) remains a challenge to developing better search functionalities or automating systematic reviews. Most current approaches rely on the sample size being explicitly mentioned in the abstract. The objective of this study was, therefore, to develop and validate additional approaches.

Materials and methods: 847 RCTs from high-impact medical journals were tagged with 6 different entities that could indicate the sample size. A named entity recognition (NER) model was trained to extract the entities and then deployed on a test set of 150 RCTs. The entities' performance in predicting the actual number of trial participants who were randomized was assessed and possible combinations of the entities were evaluated to create predictive models. The test set was also used to evaluate the performance of GPT-4o on the same task.

Results: The most accurate model could make predictions for 64.7% of trials in the test set, and the resulting predictions were equal to the ground truth in 93.8%. GPT-4o was able to make a prediction on 94.7% of trials and the resulting predictions were equal to the ground truth in 90.8%.

Discussion: This study presents an NER model that can extract different entities that can be used to predict the sample size from the abstract of an RCT. The entities can be combined in different ways to obtain models with different characteristics.

Conclusion: Training an NER model to predict the sample size from RCTs is feasible. Large language models can deliver similar performance without the need for prior training on the task although at a higher cost due to proprietary technology and/or required computational power.

Keywords: GPT-4; evidence-based medicine; machine learning; natural language processing; randomized controlled trial; text mining; transformer.