A machine learning approach to predict HPV positivity of oropharyngeal squamous cell carcinoma

Pathologica. 2024 Dec;116(6):379-389. doi: 10.32074/1591-951X-1027.

Abstract

HPV status is an important prognostic factor in oropharyngeal squamous cell carcinoma (OPSCC), with HPV-positive tumors associated with better overall survival. To determine HPV status, we rely on the immunohistochemical investigation for expression of the P16INK4a protein, which must be associated with molecular investigation for the presence of viral DNA. We aim to define a criterion based on image analysis and machine learning to predict HPV status from hematoxylin/eosin stain.

We extracted a pool of 41 morphometric and colorimetric features from each tumor cell identified from two different cohorts of tumor tissues obtained from the Cancer Genome Atlas and the archives of the Pathological Anatomy of Federico II of Naples. On this data, we built a random Forest classifier. Our model showed a 90% accuracy. We also studied the variable importance to define a criterion useful for the explainability of the model. Prediction of the molecular state of a neoplastic cell based on digitally extracted morphometric features is fascinating and promises to revolutionize histopathology. We have built a classifier capable of anticipating the result of p16-immunohistochemistry and molecular test to assess the HPV status of squamous carcinomas of the oropharynx by analyzing the hematoxylin/eosin staining.

Keywords: Computational Pathology; Digital Pathology; HPV; HistoQC; Machine Learning; OPSCC; QuPath.

MeSH terms

  • Carcinoma, Squamous Cell / pathology
  • Carcinoma, Squamous Cell / virology
  • Cyclin-Dependent Kinase Inhibitor p16* / analysis
  • DNA, Viral / analysis
  • Humans
  • Immunohistochemistry
  • Machine Learning*
  • Male
  • Oropharyngeal Neoplasms* / pathology
  • Oropharyngeal Neoplasms* / virology
  • Papillomaviridae / genetics
  • Papillomaviridae / isolation & purification
  • Papillomavirus Infections* / complications
  • Papillomavirus Infections* / diagnosis
  • Papillomavirus Infections* / pathology
  • Papillomavirus Infections* / virology
  • Squamous Cell Carcinoma of Head and Neck / pathology
  • Squamous Cell Carcinoma of Head and Neck / virology

Substances

  • Cyclin-Dependent Kinase Inhibitor p16
  • DNA, Viral
  • CDKN2A protein, human