HPV status is an important prognostic factor in oropharyngeal squamous cell carcinoma (OPSCC), with HPV-positive tumors associated with better overall survival. To determine HPV status, we rely on the immunohistochemical investigation for expression of the P16INK4a protein, which must be associated with molecular investigation for the presence of viral DNA. We aim to define a criterion based on image analysis and machine learning to predict HPV status from hematoxylin/eosin stain.
We extracted a pool of 41 morphometric and colorimetric features from each tumor cell identified from two different cohorts of tumor tissues obtained from the Cancer Genome Atlas and the archives of the Pathological Anatomy of Federico II of Naples. On this data, we built a random Forest classifier. Our model showed a 90% accuracy. We also studied the variable importance to define a criterion useful for the explainability of the model. Prediction of the molecular state of a neoplastic cell based on digitally extracted morphometric features is fascinating and promises to revolutionize histopathology. We have built a classifier capable of anticipating the result of p16-immunohistochemistry and molecular test to assess the HPV status of squamous carcinomas of the oropharynx by analyzing the hematoxylin/eosin staining.
Keywords: Computational Pathology; Digital Pathology; HPV; HistoQC; Machine Learning; OPSCC; QuPath.
Copyright © 2024 Società Italiana di Anatomia Patologica e Citopatologia Diagnostica, Divisione Italiana della International Academy of Pathology.