Deep generative AI models analyzing circulating orphan non-coding RNAs enable detection of early-stage lung cancer

Nat Commun. 2024 Nov 21;15(1):10090. doi: 10.1038/s41467-024-53851-9.

Abstract

Liquid biopsies have the potential to revolutionize cancer care through non-invasive early detection of tumors. Developing a robust liquid biopsy test requires collecting high-dimensional data from a large number of blood samples across heterogeneous groups of patients. We propose that the generative capability of variational auto-encoders enables learning a robust and generalizable signature of blood-based biomarkers. In this study, we analyze orphan non-coding RNAs (oncRNAs) from serum samples of 1050 individuals diagnosed with non-small cell lung cancer (NSCLC) at various stages, as well as sex-, age-, and BMI-matched controls. We demonstrate that our multi-task generative AI model, Orion, surpasses commonly used methods in both overall performance and generalizability to held-out datasets. Orion achieves an overall sensitivity of 94% (95% CI: 87%-98%) at 87% (95% CI: 81%-93%) specificity for cancer detection across all stages, outperforming the sensitivity of other methods on held-out validation datasets by more than ~ 30%.

MeSH terms

  • Aged
  • Artificial Intelligence
  • Biomarkers, Tumor* / blood
  • Biomarkers, Tumor* / genetics
  • Carcinoma, Non-Small-Cell Lung* / blood
  • Carcinoma, Non-Small-Cell Lung* / diagnosis
  • Carcinoma, Non-Small-Cell Lung* / genetics
  • Deep Learning
  • Early Detection of Cancer* / methods
  • Female
  • Humans
  • Liquid Biopsy / methods
  • Lung Neoplasms* / blood
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / genetics
  • Lung Neoplasms* / pathology
  • Male
  • Middle Aged
  • Neoplasm Staging
  • RNA, Untranslated / blood
  • RNA, Untranslated / genetics
  • Sensitivity and Specificity

Substances

  • Biomarkers, Tumor
  • RNA, Untranslated