Background: Measures of tumour heterogeneity derived from 18-fluoro-2-deoxyglucose positron emission tomography/computed tomography (18F-FDG PET/CT) scans are increasingly reported as potential biomarkers of non-small cell lung cancer (NSCLC) for classification and prognostication. Several segmentation algorithms have been used to delineate tumours, but their effects on the reproducibility and predictive and prognostic capability of derived parameters have not been evaluated. The purpose of our study was to retrospectively compare various segmentation algorithms in terms of inter-observer reproducibility and prognostic capability of texture parameters derived from non-small cell lung cancer (NSCLC) 18F-FDG PET/CT images. Fifty three NSCLC patients (mean age 65.8 years; 31 males) underwent pre-chemoradiotherapy 18F-FDG PET/CT scans. Three readers segmented tumours using freehand (FH), 40% of maximum intensity threshold (40P), and fuzzy locally adaptive Bayesian (FLAB) algorithms. Intraclass correlation coefficient (ICC) was used to measure the inter-observer variability of the texture features derived by the three segmentation algorithms. Univariate cox regression was used on 12 commonly reported texture features to predict overall survival (OS) for each segmentation algorithm. Model quality was compared across segmentation algorithms using Akaike information criterion (AIC).
Results: 40P was the most reproducible algorithm (median ICC 0.9; interquartile range [IQR] 0.85-0.92) compared with FLAB (median ICC 0.83; IQR 0.77-0.86) and FH (median ICC 0.77; IQR 0.7-0.85). On univariate cox regression analysis, 40P found 2 out of 12 variables, i.e. first-order entropy and grey-level co-occurence matrix (GLCM) entropy, to be significantly associated with OS; FH and FLAB found 1, i.e., first-order entropy. For each tested variable, survival models for all three segmentation algorithms were of similar quality, exhibiting comparable AIC values with overlapping 95% CIs.
Conclusions: Compared with both FLAB and FH, segmentation with 40P yields superior inter-observer reproducibility of texture features. Survival models generated by all three segmentation algorithms are of at least equivalent utility. Our findings suggest that a segmentation algorithm using a 40% of maximum threshold is acceptable for texture analysis of 18F-FDG PET in NSCLC.
Keywords: 18F-FDG PET; Inter-observer reproducibility; Non-small cell lung cancer; Prognosis; Segmentation.