The tumor microenvironment is widely recognized for its central role in driving cancer progression and influencing prognostic outcomes. There have been increasing efforts dedicated to characterizing this complex and heterogeneous environment, including developing potential prognostic tools by leveraging modern deep learning methods. However, the identification of generalizable data-driven biomarkers has been limited, in part due to the inability to interpret the complex, black-box predictions made by these models. In this study, we introduce a data-driven yet interpretable approach for identifying patterns of cell organizations in the tumor microenvironment that are associated with patient prognoses. Our methodology relies on the construction of a bi-level graph model: (i) a cellular graph, which models the intricate tumor microenvironment, and (ii) a population graph that captures inter-patient similarities, given their respective cellular graphs, by means of a soft Weisfeiler-Lehman subtree kernel. This systematic integration of information across different scales enables us to identify patient subgroups exhibiting unique prognoses while unveiling tumor microenvironment patterns that characterize them. We demonstrate our approach in a cohort of breast cancer patients and show that the identified tumor microenvironment patterns result in a risk stratification system that provides new complementary information with respect to standard stratification systems. Our results, which are validated in two independent cohorts, allow for new insights into the prognostic implications of the breast tumor microenvironment. This methodology could be applied to other cancer types more generally, providing insights into the cellular patterns of organization associated with different outcomes.
Keywords: breast cancer; graph kernel; graph learning; prognosis; spatial analysis; tumor microenvironment.