Assessing programmed death ligand 1 (PD-L1) expression on tumor cells (TCs) using Food and Drug Administration-approved, validated immunoassays can guide the use of immune checkpoint inhibitor (ICI) therapy in cancer treatment. However, substantial interobserver variability has been reported using these immunoassays. Artificial intelligence (AI) has the potential to accurately measure biomarker expression in tissue samples, but its reliability and comparability to standard manual scoring remain to be evaluated. This multinational study sought to compare the %TC scoring of PD-L1 expression in advanced urothelial carcinoma, assessed by either an AI Measurement Model (AIM-PD-L1) or expert pathologists. The concordance among pathologists and between pathologists and AIM-PD-L1 was determined. The positivity rate of ≥ 1%TC PD-L1 was between 20-30% for 8/10 pathologists, and the degree of agreement and scoring distribution for among pathologists and between pathologists and AIM-PD-L1 was similar both scored as a continuous variable or using the pre-defined cutoff. Numerically higher score variation was observed with the 22C3 assay than with the 28-8 assay. A 2-h training module on the 28-8 assay did not significantly impact manual assessment. Cases exhibiting significantly higher variability in the assessment of PD-L1 expression (mean absolute deviation > 10) were found to have patterns of PD-L1 staining that were more challenging to interpret. An improved understanding of sources of manual scoring variability can be applied to PD-L1 expression analysis in the clinical setting. In the future, the application of AI algorithms could serve as a valuable reference guide for pathologists while scoring PD-L1.
Keywords: Artificial intelligence; Bladder cancer; PD-L1; Pathology.
© 2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.