Development and validation of a prognostic model for cervical cancer by combination of machine learning and high-throughput sequencing

Eur J Surg Oncol. 2024 Apr;50(4):108241. doi: 10.1016/j.ejso.2024.108241. Epub 2024 Mar 2.

Abstract

Background: Cervical cancer holds the highest morbidity and mortality rates among female reproductive tract tumors. However, the curative outcomes for patients with persistent, recurrent, or metastatic cervical cancer remain unsatisfactory. There is a lack of comprehensive prognostic indicators for cervical cancer. This study aims to develop a model that evaluates the prognosis of cervical cancer in combination of high-throughput sequencing and various machine learning algorithms.

Methods: In this study, we combined two single-cell RNA sequencing (scRNA-seq) projects and TCGA data for cervical cancer to obtain shared differentially expressed genes (DEGs). A LASSO regression and several learners were applied for signature feature selection. Six machine learning algorithms including Linear Discriminant Analysis, Naive Bayes, K Nearest Neighbors, Decision Tree, Random Forest, and eXtreme Gradient Boosting were utilized to construct a prognostic model for cervical cancer. External validation was conducted using the CGCI-HTMCP-CC dataset, and the accuracy of the model was assessed through ROC curve analysis.

Results: The results demonstrated the successful construction of a prognostic model based on DEGs from bulk- and scRNA-seq data. Ten genes CXCL8, DLC1, GRN, MPLKIP, PRDX1, RUNX1, SNX3, TFRC, UBE2V2, and UQCRC1 were screened by feature selection and applied for model construction. Random Forest exhibited the best performance in predicting the risk of cervical cancer. Patients in the high-risk group presented worse overall survival compared to those in the low-risk group.

Conclusion: Conclusively, our model based on DEGs from bulk-seq and scRNA-seq data effectively evaluates the prognosis of cervical cancer and provides valuable insights for comprehensive clinical management.

Keywords: Bulk sequencing; Differentially expressed genes; Feature selection; Machine learning; Prognosis; Single-cell RNA sequencing.

MeSH terms

  • Adaptor Proteins, Signal Transducing
  • Bayes Theorem
  • Female
  • GTPase-Activating Proteins
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Machine Learning
  • Prognosis
  • Tumor Suppressor Proteins
  • Uterine Cervical Neoplasms* / genetics

Substances

  • DLC1 protein, human
  • GTPase-Activating Proteins
  • Tumor Suppressor Proteins
  • MPLKIP protein, human
  • Adaptor Proteins, Signal Transducing