SMR-guided molecular subtyping and machine learning model reveals novel prognostic biomarkers and therapeutic targets in non-small cell lung adenocarcinoma

Sci Rep. 2025 Jan 10;15(1):1640. doi: 10.1038/s41598-025-85471-8.

Abstract

Non-small cell lung adenocarcinoma (LUAD) is a markedly heterogeneous disease, with its underlying molecular mechanisms and prognosis prediction presenting ongoing challenges. In this study, we integrated data from multiple public datasets, including TCGA, GSE31210, and GSE13213, encompassing a total of 867 tumor samples. By employing Mendelian randomization (MR) analysis, machine learning techniques, and comprehensive bioinformatics approaches, we conducted an in-depth investigation into the molecular characteristics, prognostic markers, and potential therapeutic targets of LUAD. Our analysis identified 321 genes significantly associated with LUAD, with CENP-A, MCM7, and DLGAP5 emerging as highly connected nodes in network analyses. By performing correlation analysis and Cox regression analysis, we identified 26 prognostic genes and classified LUAD samples into two molecular subtypes with significantly distinct survival outcomes. The Random Survival Forest (RSF) model exhibited robust prognostic predictive capabilities across multiple independent cohorts (AUC > 0.75). Beyond merely predicting patient outcomes, this model also captures key features of the tumor immune microenvironment and potential therapeutic responses. Functional enrichment analysis revealed the complex interplay of cell cycle regulation, DNA repair, immune response, and metabolic reprogramming in the progression of LUAD. Furthermore, we observed a strong correlation between risk scores and the expression of specific cytokines, such as CCL17, CCR2, and CCL20, suggesting novel avenues for developing cytokine network-based therapeutic strategies. This study offers fresh insights into the molecular subtyping, prognostic prediction, and personalized therapeutic decision-making in LUAD, laying a critical foundation for future clinical applications and targeted therapy research.

Keywords: Machine learning prognostic model; Mendelian randomization; Molecular subtypes; Multi-omics integrative analysis; Non-small cell lung adenocarcinoma, LUAD.

MeSH terms

  • Adenocarcinoma of Lung* / genetics
  • Adenocarcinoma of Lung* / metabolism
  • Adenocarcinoma of Lung* / mortality
  • Adenocarcinoma of Lung* / pathology
  • Biomarkers, Tumor* / genetics
  • Biomarkers, Tumor* / metabolism
  • Carcinoma, Non-Small-Cell Lung* / genetics
  • Carcinoma, Non-Small-Cell Lung* / metabolism
  • Carcinoma, Non-Small-Cell Lung* / mortality
  • Carcinoma, Non-Small-Cell Lung* / pathology
  • Computational Biology / methods
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Lung Neoplasms* / genetics
  • Lung Neoplasms* / metabolism
  • Lung Neoplasms* / mortality
  • Lung Neoplasms* / pathology
  • Machine Learning*
  • Mendelian Randomization Analysis
  • Prognosis
  • Tumor Microenvironment / genetics

Substances

  • Biomarkers, Tumor