Purpose: A semiautomated pipeline for the collection and curation of free-text and imaging real-world data (RWD) was developed to quantify cancer treatment outcomes in large-scale retrospective real-world studies. The objectives of this article are to illustrate the challenges of RWD extraction, to demonstrate approaches for quality assurance, and to showcase the potential of RWD for precision oncology.
Methods: We collected data from patients with advanced melanoma receiving immune checkpoint inhibitors at the Lausanne University Hospital. Cohort selection relied on semantically annotated electronic health records and was validated using process mining. The selected imaging examinations were segmented using an automatic commercial software prototype. A postprocessing algorithm enabled longitudinal lesion identification across imaging time points and consensus malignancy status prediction. Resulting data quality was evaluated against expert-annotated ground-truth and clinical outcomes obtained from radiology reports.
Results: The cohort included 108 patients with melanoma and 465 imaging examinations (median, 3; range, 1-15 per patient). Process mining was used to assess clinical data quality and revealed the diversity of care pathways encountered in a real-world setting. Longitudinal postprocessing greatly improved the consistency of image-derived data compared with single time point segmentation results (classification precision increased from 53% to 86%). Image-derived progression-free survival resulting from postprocessing was comparable with the manually curated clinical reference (median survival of 286 v 336 days, P = .89).
Conclusion: We presented a general pipeline for the collection and curation of text- and image-based RWD, together with specific strategies to improve reliability. We showed that the resulting disease progression measures match reference clinical assessments at the cohort level, indicating that this strategy has the potential to unlock large amounts of actionable retrospective real-world evidence from clinical records.