Objective: To measure and describe patterns of interobserver variation in visual interpretation of 18-FDG PET in malignant lymphoma.
Methods: Eleven nuclear medicine physicians with different levels of PET experience independently reviewed 37 18F-FDG PET scans of lymphoma patients (10 obtained at presentation, 27 during or after therapy). They were requested to identify and localize suspicious lymphoma sites and to assign a stage to the baseline scans, and to interpret the remaining scans for the presence of viable lymphoma. Individual (extra-)nodal regions were assessed for the likelihood of malignancy as positive, negative or equivocal. These results were compared to expert readings after dichotomization in conservative and sensitive reading classifications.
Results: Sixty-one percent and 56% (using sensitive and conservative reading, respectively) of the baseline scans were scored in accordance with the experts. Fourteen of the 27 scans obtained for therapy evaluation with viable tumour sites were scored in accordance with the experts in 82% and 94% of the patients, using conservative and sensitive reading, respectively. The 13 negative scans were scored in agreement with the experts in only 45% of the cases. False positivity pertained especially to the neck, periclavicular, axilla, mediastinum, lung and bone marrow. More experienced observers tended to have fewer false negative scores.
Conclusion: There are substantial disparities among nuclear medicine physicians' interpretations of FDG PET scans of lymphoma patients, which may affect patient care and results of multi-institutional clinical trials. A well-defined set of criteria is urgently needed to improve consistency.