A large annotated cervical cytology images dataset for AI models to aid cervical cancer screening

Sci Data. 2025 Jan 7;12(1):23. doi: 10.1038/s41597-025-04374-5.

Abstract

Accurate detection of abnormal cervical cells in cervical cancer screening increases the chances of timely treatment. The vigorous development of deep learning methods has established a new ecosystem for cervical cancer screening, which has been proven to effectively improve efficiency and accuracy of cell detection in many studies. Although many contributing studies have been conducted, limited public datasets and time-consuming collection efforts may hinder the generalization performance of those advanced models and restrict further research. Through this work, we seek to provide a large dataset of cervical cytology images with exhaustive annotations of abnormal cervical cells. The dataset consists of 8,037 images derived from 129 scanned Thinprep cytologic test (TCT) slide images. Furthermore, we performed evaluation experiments to demonstrate the performance of representative models trained on our dataset in abnormal cells detection.

Publication types

  • Dataset

MeSH terms

  • Cervix Uteri / diagnostic imaging
  • Cervix Uteri / pathology
  • Deep Learning
  • Early Detection of Cancer*
  • Female
  • Humans
  • Uterine Cervical Neoplasms* / diagnostic imaging
  • Uterine Cervical Neoplasms* / pathology