Accurate prediction of colorectal cancer diagnosis using machine learning based on immunohistochemistry pathological images

Sci Rep. 2024 Dec 2;14(1):29882. doi: 10.1038/s41598-024-76083-9.

Abstract

Colorectal cancer (CRC) ranks as the third most prevalent tumor and the second leading cause of mortality. Early and accurate diagnosis holds significant importance in enhancing patient treatment and prognosis. Machine learning technology and bioinformatics have provided novel approaches for cancer diagnosis. This study aims to develop a CRC diagnostic model based on immunohistochemical staining image features using machine learning methods. Initially, CRC disease-specific genes were identified through bioinformatics analysis, SVM-RFE and Random Forest algorithm utilizing RNA-seq data from both GEO and TCGA databases. Subsequently, verification of these genes was performed using proteomics data from CPTAC and HPA database, resulting in identification of target proteins (AKR1B10, CA2, DHRS9, and ZG16) for further investigation. SVM and CNN were then employed to analyze and integrate the characteristics of immunohistochemical images to construct a reliable CRC diagnostic model. During the training and validation process of this model, cross-validation along with external validation methods were implemented to ensure accuracy and reliability. The results demonstrate that the established diagnostic model exhibits excellent performance in distinguishing between CRC and normal controls (accuracy rate: 0.999), thereby presenting potential prospects for clinical application. These findings are expected to provide innovative perspectives as well as methodologies for personalized diagnosis of CRC while offering more precise references for promising treatment.

Keywords: Colorectal cancer; Diagnosis; Immunohistochemistry; Machine learning.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor / genetics
  • Colorectal Neoplasms* / diagnosis
  • Colorectal Neoplasms* / genetics
  • Colorectal Neoplasms* / metabolism
  • Colorectal Neoplasms* / pathology
  • Computational Biology / methods
  • Humans
  • Immunohistochemistry* / methods
  • Machine Learning*
  • Proteomics / methods
  • Support Vector Machine

Substances

  • Biomarkers, Tumor