Background: Cholangiocarcinoma (CC) is a rare and aggressive disease with limited therapeutic options and a poor prognosis. All available public records of cohorts reporting transcriptomic data on intrahepatic cholangiocarcinoma (ICC) and extrahepatic cholangiocarcinoma (ECC) were collected with the aim to provide a comprehensive gene expression-based classification with clinical relevance.
Methods: A total of 543 patients with primary tumor tissues profiled by RNAseq and microarray platforms from seven public datasets were used as a discovery set to identify distinct biological subgroups. Group predictors developed on the discovery sets were applied to a single cohort of 131 patients profiled with RNAseq for validation and assessment of clinical relevance leveraging machine learning techniques.
Results: By unsupervised clustering analysis of gene expression data we identified both in the ICC and ECC discovery datasets four subgroups characterized by a distinct type of immune infiltrate and signaling pathways. We next developed class predictors using short gene list signatures and identified in an independent dataset subgroups of ICC tumors at different prognosis.
Conclusions: The developed class-predictor allows identification of CC subgroups with specific biological features and clinical behavior at single-sample level. Such results represent the starting point for a complete molecular characterization of CC, including integration of genomics data to develop in clinical practice.
Keywords: bioinformatics; cholangiocarcinoma; next generation sequencing; transcriptomics; tumor-infiltrating immune cells.
© 2023 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.