Objective: The present study aimed to verify the classification performance of deep learning (DL) models for diagnosing fractures of the mandibular condyle on panoramic radiographs using data sets from two hospitals and to compare their internal and external validities.
Methods: Panoramic radiographs of 100 condyles with and without fractures were collected from two hospitals and a fivefold cross-validation method was employed to construct and evaluate the DL models. The internal and external validities of classification performance were evaluated as accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC).
Results: For internal validity, high classification performance was obtained, with AUC values of >0.85. Conversely, external validity for the data sets from the two hospitals exhibited low performance. Using combined data sets from both hospitals, the DL model exhibited high performance, which was slightly superior or equal to that of the internal validity but without a statistically significant difference.
Conclusion: The constructed DL model can be clinically employed for diagnosing fractures of the mandibular condyle using panoramic radiographs. However, the domain shift phenomenon should be considered when generalizing DL systems.
Keywords: Artificial intelligence; Deep learning; Mandibular condyle; Mandibular fracture; Panoramic radiography.