Background: There is currently no formal consensus on the administration of adjuvant chemotherapy to stage I lung squamous cell carcinoma (LUSC) patients despite the poor prognosis. The side effects of adjuvant chemotherapy need to be balanced against the risk of tumour recurrence. Prognostic markers are thus needed to identify those at higher risks and recommend individualised treatment regimens.
Methods: Clinical and sequencing data of stage I patients were retrieved from the Lung Squamous Cell Carcinoma project of the Cancer Genome Atlas (TCGA) and three tissue microarray datasets. In a novel K-resample gene selection algorithm, gene-wise Cox proportional hazard regressions were repeated for 50 iterations with random resamples from the TCGA training dataset. The top 200 genes with the best predictive power for survival were chosen to undergo an L1-penalised Cox regression for further gene selection.
Results: A total of 602 samples of LUSC were included, of which 42.2% came from female patients, 45.3% were stage IA cancer. From an initial pool of 11,212 genes in the TCGA training dataset, a final set of 12 genes were selected to construct the multivariate Cox prognostic model. Among the 12 selected genes, 5 genes, STAU1, ADGRF1, ATF7IP2, MALL and KRT23, were adverse prognostic factors for patients, while seven genes, NDUFB1, CNPY2, ZNF394, PIN4, FZD8, NBPF26 and EPYC, were positive prognostic factors. An equation for risk score was thus constructed from the final multivariate Cox model. The model performance was tested in the sequestered TCGA testing dataset and validated in external tissue microarray datasets (GSE4573, GSE31210 and GSE50081), demonstrating its efficacy in stratifying patients into high- and low-risk groups with significant survival difference both in the whole set (including stage IA and IB) and in the stage IA only subgroup of each set. The prognostic power remains significant after adjusting for standard clinical factors. When benchmarked against other prominent gene-signature based prognostic models, the model outperformed the rest in the TCGA testing dataset and in predicting long-term risk at eight years in all three validation datasets.
Conclusion: The 12-gene prognostic model may serve as a useful complementary clinical risk-stratification tool for stage I and especially stage IA lung squamous cell carcinoma patients to guide clinical decision making.
Keywords: Gene signature; Lung squamous cell carcinoma; Prognostic model; Risk stratification.
© 2021. Federación de Sociedades Españolas de Oncología (FESEO).