Machine learning-based analysis identifies and validates serum exosomal proteomic signatures for the diagnosis of colorectal cancer

Cell Rep Med. 2024 Aug 20;5(8):101689. doi: 10.1016/j.xcrm.2024.101689.

Abstract

The potential of serum extracellular vesicles (EVs) as non-invasive biomarkers for diagnosing colorectal cancer (CRC) remains elusive. We employed an in-depth 4D-DIA proteomics and machine learning (ML) pipeline to identify key proteins, PF4 and AACT, for CRC diagnosis in serum EV samples from a discovery cohort of 37 cases. PF4 and AACT outperform traditional biomarkers, CEA and CA19-9, detected by ELISA in 912 individuals. Furthermore, we developed an EV-related random forest (RF) model with the highest diagnostic efficiency, achieving AUC values of 0.960 and 0.963 in the train and test sets, respectively. Notably, this model demonstrated reliable diagnostic performance for early-stage CRC and distinguishing CRC from benign colorectal diseases. Additionally, multi-omics approaches were employed to predict the functions and potential sources of serum EV-derived proteins. Collectively, our study identified the crucial proteomic signatures in serum EVs and established a promising EV-related RF model for CRC diagnosis in the clinic.

Keywords: biomarker; colorectal cancer; diagnosis; extracellular vesicles; machine learning.

MeSH terms

  • Aged
  • Biomarkers, Tumor* / blood
  • Colorectal Neoplasms* / blood
  • Colorectal Neoplasms* / diagnosis
  • Exosomes* / metabolism
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Proteome / analysis
  • Proteome / metabolism
  • Proteomics* / methods

Substances

  • Biomarkers, Tumor
  • Proteome