Fairness of Machine Learning Algorithms for Predicting Foregone Preventive Dental Care for Adults

JAMA Netw Open. 2023 Nov 1;6(11):e2341625. doi: 10.1001/jamanetworkopen.2023.41625.

Abstract

Importance: Access to routine dental care prevents advanced dental disease and improves oral and overall health. Identifying individuals at risk of foregoing preventive dental care can direct prevention efforts toward high-risk populations.

Objective: To predict foregone preventive dental care among adults overall and in sociodemographic subgroups and to assess the algorithmic fairness.

Design, setting, and participants: This prognostic study was a secondary analyses of longitudinal data from the US Medical Expenditure Panel Survey (MEPS) from 2016 to 2019, each with 2 years of follow-up. Participants included adults aged 18 years and older. Data analysis was performed from December 2022 to June 2023.

Exposure: A total of 50 predictors, including demographic and socioeconomic characteristics, health conditions, behaviors, and health services use, were assessed.

Main outcomes and measures: The outcome of interest was foregoing preventive dental care, defined as either cleaning, general examination, or an appointment with the dental hygienist, in the past year.

Results: Among 32 234 participants, the mean (SD) age was 48.5 (18.2) years and 17 386 participants (53.9%) were female; 1935 participants (6.0%) were Asian, 5138 participants (15.9%) were Black, 7681 participants (23.8%) were Hispanic, 16 503 participants (51.2%) were White, and 977 participants (3.0%) identified as other (eg, American Indian and Alaska Native) or multiple racial or ethnic groups. There were 21 083 (65.4%) individuals who missed preventive dental care in the past year. The algorithms demonstrated high performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.84 (95% CI, 0.84-0.85) in the overall population. While the full sample model performed similarly when applied to White individuals and older adults (AUC, 0.88; 95% CI, 0.87-0.90), there was a loss of performance for other subgroups. Removing the subgroup-sensitive predictors (ie, race and ethnicity, age, and income) did not impact model performance. Models stratified by race and ethnicity performed similarly or worse than the full model for all groups, with the lowest performance for individuals who identified as other or multiple racial groups (AUC, 0.76; 95% CI, 0.70-0.81). Previous pattern of dental visits, health care utilization, dental benefits, and sociodemographic characteristics were the highest contributing predictors to the models' performance.

Conclusions and relevance: Findings of this prognostic study using cohort data suggest that tree-based ensemble machine learning models could accurately predict adults at risk of foregoing preventive dental care and demonstrated bias against underrepresented sociodemographic groups. These results highlight the importance of evaluating model fairness during development and testing to avoid exacerbating existing biases.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Algorithms
  • Dental Care
  • Ethnicity*
  • Humans
  • Machine Learning
  • Racial Groups*