A Novel Benchmarking Approach to Assess the Agreement among Radiomic Tools

Radiology. 2022 Jun;303(3):533-541. doi: 10.1148/radiol.211604. Epub 2022 Mar 1.

Abstract

Background The translation of radiomic models into clinical practice is hindered by the limited reproducibility of features across software and studies. Standardization is needed to accelerate this process and to bring radiomics closer to clinical deployment. Purpose To assess the standardization level of seven radiomic software programs and investigate software agreement as a function of built-in image preprocessing (eg, interpolation and discretization), feature aggregation methods, and the morphological characteristics (ie, volume and shape) of the region of interest (ROI). Materials and Methods The study was organized into two phases: In phase I, the two Image Biomarker Standardization Initiative (IBSI) phantoms were used to evaluate the IBSI compliance of seven software programs. In phase II, the reproducibility of all IBSI-standardized radiomic features across tools was assessed with two custom Italian multicenter Shared Understanding of Radiomic Extractors (ImSURE) digital phantoms that allowed, in conjunction with a systematic feature extraction, observations on whether and how feature matches between program pairs varied depending on the preprocessing steps, aggregation methods, and ROI characteristics. Results In phase I, the software programs showed different levels of completeness (ie, the number of computable IBSI benchmark values). However, the IBSI-compliance assessment revealed that they were all standardized in terms of feature implementation. When considering additional preprocessing steps, for each individual program, match percentages fell by up to 30%. In phase II, the ImSURE phantoms showed that software agreement was dependent on discretization and aggregation as well as on ROI shape and volume factors. Conclusion The agreement of radiomic software varied in relation to factors that had already been standardized (eg, interpolation and discretization methods) and factors that need standardization. Both dependences must be resolved to ensure the reproducibility of radiomic features and to pave the way toward the clinical adoption of radiomic models. Published under a CC BY 4.0 license. Online supplemental material is available for this article. See also the editorial by Steiger in this issue. An earlier incorrect version appeared online and in print. This article was corrected on March 2, 2022.

Publication types

  • Multicenter Study

MeSH terms

  • Benchmarking*
  • Humans
  • Image Processing, Computer-Assisted* / methods
  • Phantoms, Imaging
  • Reproducibility of Results
  • Software