A dental intraoral image dataset of gingivitis for image captioning

Data Brief. 2024 Sep 19:57:110960. doi: 10.1016/j.dib.2024.110960. eCollection 2024 Dec.

Abstract

One of the most striking topics in Artificial Intelligence (AI) is Image captioning that aims to integrate computer vision and natural language processing to create descriptions for each image. In this paper, we propose a new dataset designed specifically for image captioning in gingivitis diagnosis using deep learning. It includes 1,096 high-resolution intraoral images of 12 anterior teeth and surrounding gingival tissue that were collected under controlled conditions with professional-grade photography equipment. Each image features detailed labels and descriptive captions. The labeling process involved three periodontists with over ten years of experience who assigned Modified Gingival Index (MGI) scores to each tooth in the images, achieving high inter-rater reliability through a rigorous calibration process. Captions were then created by the same periodontists, offering diverse descriptions of gingivitis severity and locations. The dataset is systematically organized into training, validation, and testing subsets for systematic accessibility. This dataset supports the development of advanced image captioning algorithms and is a valuable educational resource for integrating real-world data into dental research and curriculum.

Keywords: Annotation; Caption generation from images; Computer vision; Deep learning; Gum disease diagnosis.