Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods

NPJ Syst Biol Appl. 2025 Jan 17;11(1):11. doi: 10.1038/s41540-025-00491-4.

Abstract

Classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) poses significant challenges for cytopathologists, often necessitating clinical tests and biopsies that delay treatment initiation. To address this, we developed a machine learning-based approach utilizing resected lung-tissue microbiome of AC and SCC patients for subtype classification. Differentially enriched taxa were identified using LEfSe, revealing ten potential microbial markers. Linear discriminant analysis (LDA) was subsequently applied to enhance inter-class separability. Next, benchmarking was performed across six different supervised-classification algorithms viz. logistic-regression, naïve-bayes, random-forest, extreme-gradient-boost (XGBoost), k-nearest neighbor, and deep neural network. Noteworthy, XGBoost, with an accuracy of 76.25%, and AUROC (area-under-receiver-operating-characteristic) of 0.81 with 69% specificity and 76% sensitivity, outperform the other five classification algorithms using LDA-transformed features. Validation on an independent dataset confirmed its robustness with an AUROC of 0.71, with minimal false positives and negatives. This study is the first to classify AC and SCC subtypes using lung-tissue microbiome.

MeSH terms

  • Adenocarcinoma / microbiology
  • Aged
  • Algorithms*
  • Carcinoma, Non-Small-Cell Lung* / microbiology
  • Carcinoma, Squamous Cell / microbiology
  • Female
  • Humans
  • Lung Neoplasms* / microbiology
  • Lung* / microbiology
  • Machine Learning*
  • Male
  • Microbiota* / genetics
  • Middle Aged
  • ROC Curve