End-to-end framework for automated collection of large multicentre radiotherapy datasets demonstrated in a Danish Breast Cancer Group cohort

Phys Imaging Radiat Oncol. 2023 Aug 25:27:100485. doi: 10.1016/j.phro.2023.100485. eCollection 2023 Jul.

Abstract

Large Digital Imaging and Communications in Medicine (DICOM) datasets are key to support research and the development of machine learning technology in radiotherapy (RT). However, the tools for multi-centre data collection, curation and standardisation are not readily available. Automated batch DICOM export solutions were demonstrated for a multicentre setup. A Python solution, Collaborative DICOM analysis for RT (CORDIAL-RT) was developed for curation, standardisation, and analysis of the collected data. The setup was demonstrated in the DBCG RT-Nation study, where 86% (n = 7748) of treatments in the inclusion period were collected and quality assured, supporting the applicability of the end-to-end framework.

Keywords: Automation; Big data; Breast cancer; DICOM; Data collection; Data science; Radiotherapy.