Automatic segmentation and classification of Papanicolaou-stained cells and dataset for oral cancer detection

Comput Biol Med. 2024 Sep:180:108967. doi: 10.1016/j.compbiomed.2024.108967. Epub 2024 Aug 6.

Abstract

Background and objective: Papanicolaou staining has been successfully used to assist early detection of cervix cancer for several decades. We postulate that this staining technique can also be used for assisting early detection of oral cancer, which is responsible for about 300,000 deaths every year. The rational for such claim includes two key observations: (i) nuclear atypia, i.e., changes in volume, shape, and staining properties of the cell nuclei can be linked to rapid cell proliferation and genetic instability; and (ii) Papanicolaou staining allows one to reliably segment cells' nuclei and cytoplasms. While Papanicolaou staining is an attractive tool due to its low cost, its interpretation requires a trained pathologist. Our goal is to automate the segmentation and classification of morphological features needed to evaluate the use of Papanicolaou staining for early detection of mouth cancer.

Methods: We built a convolutional neural network (CNN) for automatic segmentation and classification of cells in Papanicolaou-stained images. Our CNN was trained and evaluated on a new image dataset of cells from oral mucosa consisting of 1,563 Full HD images from 52 patients, annotated by specialists. The effectiveness of our model was evaluated against a group of experts. Its robustness was also demonstrated on five public datasets of cervical images captured with different microscopes and cameras, and having different resolutions, colors, background intensities, and noise levels.

Results: Our CNN model achieved expert-level performance in a comparison with a group of three human experts on a set of 400 Papanicolaou-stained images of the oral mucosa from 20 patients. The results of this experiment exhibited high Interclass Correlation Coefficient (ICC) values. Despite being trained on images from the oral mucosa, it produced high-quality segmentation and plausible classification for five public datasets of cervical cells. Our Papanicolaou-stained image dataset is the most diverse publicly available image dataset for the oral mucosa in terms of number of patients.

Conclusion: Our solution provides the means for exploring the potential of Papanicolaou-staining as a powerful and inexpensive tool for early detection of oral cancer. We are currently using our system to detect suspicious cells and cell clusters in oral mucosa slide images. Our trained model, code, and dataset are available and can help practitioners and stimulate research in early oral cancer detection.

Keywords: Automatic segmentation and classification; CNN; Oral cancer detection; Papanicolaou-stained cells.

MeSH terms

  • Early Detection of Cancer / methods
  • Female
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Mouth Neoplasms* / diagnostic imaging
  • Mouth Neoplasms* / pathology
  • Neural Networks, Computer
  • Papanicolaou Test*
  • Staining and Labeling / methods