The analytical and clinical validity of AI algorithms to score TILs in TNBC: can we use different machine learning models interchangeably?

Joan Martínez Vidal; Nikos Tsiknakis; Johan Staaf; Ana Bosch; Anna Ehinger; Emma Nimeus; Roberto Salgado; Yalai Bai; David L Rimm; Johan Hartman; Balazs Acs

doi:10.1016/j.eclinm.2024.102928

The analytical and clinical validity of AI algorithms to score TILs in TNBC: can we use different machine learning models interchangeably?

EClinicalMedicine. 2024 Nov 15:78:102928. doi: 10.1016/j.eclinm.2024.102928. eCollection 2024 Dec.

Authors

Joan Martínez Vidal¹, Nikos Tsiknakis¹, Johan Staaf², Ana Bosch^{2

3}, Anna Ehinger⁴, Emma Nimeus^{2

5

6}, Roberto Salgado^{7

8}, Yalai Bai⁹, David L Rimm^{9

10}, Johan Hartman^{1

11}, Balazs Acs^{1

11}

Affiliations

¹ Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden.
² Division of Oncology, Department of Clinical Sciences Lund, Lund University, Medicon Village, SE-22381, Lund, Sweden.
³ Department of Hematology, Oncology and Radiation Physics, Region Skåne, Lund, Sweden.
⁴ Department of Genetics, Pathology and Molecular Diagnostics, Laboratory Medicine, Region Skåne, Lund, Sweden.
⁵ Division of Surgery, Department of Clinical Sciences, Lund University, Lund, Sweden.
⁶ Department of Surgery, Skåne University Hospital, Malmö, Sweden.
⁷ Department of Pathology, GZA-ZNA Hospitals, Antwerp, Belgium.
⁸ Division of Research, Peter MacCallum Cancer Centre, Melbourne, Australia.
⁹ Department of Pathology, Yale School of Medicine, New Haven, CT, USA.
¹⁰ Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT, USA.
¹¹ Department of Clinical Pathology and Cancer Diagnostics, Karolinska University Hospital, Stockholm, Sweden.

Abstract

Background: Pathologist-read tumor-infiltrating lymphocytes (TILs) have showcased their predictive and prognostic potential for early and metastatic triple-negative breast cancer (TNBC) but it is still subject to variability. Artificial intelligence (AI) is a promising approach toward eliminating variability and objectively automating TILs assessment. However, demonstrating robust analytical and prognostic validity is the key challenge currently preventing their integration into clinical workflows.

Methods: We evaluated the impact of ten AI models on TILs scoring, emphasizing their distinctions in TILs analytical and prognostic validity. Several AI-based TILs scoring models (seven developed and three previously validated AI models) were tested in a retrospective analytical cohort and in an independent prospective cohort to compare prognostic validation against invasive disease-free survival endpoint with 4 years median follow-up. The development and analytical validity set consisted of diagnostic tissue slides of 79 women with surgically resected primary invasive TNBC tumors diagnosed between 2012 and 2016 from the Yale School of Medicine. An independent set comprising of 215 TNBC patients from Sweden diagnosed between 2010 and 2015, was used for testing prognostic validity.

Findings: A significant difference in analytical validity (Spearman's r = 0.63-0.73, p < 0.001) is highlighted across AI methodologies and training strategies. Interestingly, the prognostic performance of digital TILs is demonstrated for eight out of ten AI models, even less extensively trained ones, with similar and overlapping hazard ratios (HR) in the external validation cohort (Cox regression analysis based on IDFS-endpoint, HR = 0.40-0.47; p < 0.004).

Interpretation: The demonstrated prognostic validity for most of the AI TIL models can be attributed to the intrinsic robustness of host anti-tumor immunity (measured by TILs) as a biomarker. However, the discrepancies between AI models should not be overlooked; rather, we believe that there is a critical need for an accessible, large, multi-centric dataset that will serve as a benchmark ensuring the comparability and reliability of different AI tools in clinical implementation.

Funding: Nikos Tsiknakis is supported by the Swedish Research Council (Grant Number 2021-03061, Theodoros Foukakis). Balazs Acs is supported by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning) postdoctoral grant. Roberto Salgado is supported by a grant from Breast Cancer Research Foundation (BCRF).

Keywords: Artificial intelligence; Breast cancer; Deep learning; Machine learning; TILs; Tumor infiltrating lymphocytes.