Diagnostic potential of salivary microbiota in persistent pulmonary nodules: identifying biomarkers and functional pathways using 16S rRNA sequencing and machine learning

J Transl Med. 2024 Nov 28;22(1):1079. doi: 10.1186/s12967-024-05802-7.

Abstract

Background: The aim of this study was to explore the microbial variations and biomarkers in the oral environment of patients with persistent pulmonary nodules (pPNs) and to reveal the potential biological functions of the salivary microbiota in pPNs.

Materials and methods: This study included a total of 483 participants (141 healthy controls and 342 patients with pPNs) from June 2022 and January 2024. Saliva samples were subjected to sequencing of the V3-V4 region of the 16S rRNA gene to assess microbial diversity and differential abundance. Seven advanced machine learning algorithms (logistic regression, support vector machine, multi-layer perceptron, naïve Bayes, random forest, gradient boosting decision tree, and LightGBM) were utilized to evaluate performance and identify key microorganisms, with fivefold cross-validation employed to ensure robustness. The Shapley Additive exPlanations (SHAP) algorithm was employed to explain the contribution of these core microbiotas to the predictive model. Additionally, the PICRUSt2 algorithm was used to predict the microbial functions.

Results: The salivary microbial composition in pPNs group showed significantly lower α- and β-diversity compared to healthy controls. A high-accuracy LightGBM model was developed, identifying six core genera-Fusobacterium, Solobacterium, Actinomyces, Porphyromonas, Atopobium, and Peptostreptococcus-as pPNs biomarkers. Additionally, a visualization pPNs risk prediction system was developed. The immune responses and metabolic activities differences in salivary microbiota between the patients with pPNs and healthy controls were revealed.

Conclusions: This study highlights the potential clinical applications of the salivary microbiota for enable earlier detection and targeted interventions, offering significant promise for advancing clinical management and improving patient outcomes in pPNs.

Keywords: 16S rRNA sequencing; Biomakers; Lung cancer; Machine learning; Microbiota; Persistent pulmonary nodules.

MeSH terms

  • Adult
  • Biomarkers* / metabolism
  • Case-Control Studies
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Microbiota* / genetics
  • Middle Aged
  • Multiple Pulmonary Nodules / diagnosis
  • Multiple Pulmonary Nodules / microbiology
  • RNA, Ribosomal, 16S* / genetics
  • Saliva* / microbiology

Substances

  • RNA, Ribosomal, 16S
  • Biomarkers