A machine learning tool for early identification of celiac disease autoimmunity

Sci Rep. 2024 Dec 28;14(1):30760. doi: 10.1038/s41598-024-80817-0.

Abstract

Identifying which patients should undergo serologic screening for celiac disease (CD) may help diagnose patients who otherwise often experience diagnostic delays or remain undiagnosed. Using anonymized outpatient data from the electronic medical records of Maccabi Healthcare Services, we developed and evaluated five machine learning models to classify patients as at-risk for CD autoimmunity prior to first documented diagnosis or positive serum tissue transglutaminase (tTG-IgA). A train set of highly seropositive (tTG-IgA > 10X ULN) cases (n = 677) with likely CD and controls (n = 176,293) with no evidence of CD autoimmunity was used for model development. Input features included demographic information and commonly available laboratory results. The models were then evaluated for discriminative ability as measured by AUC on a distinct set of highly seropositive cases (n = 153) and controls (n = 41,087). The highest performing model was XGBoost (AUC = 0.86), followed by logistic regression (AUC = 0.85), random forest (AUC = 0.83), multilayer perceptron (AUC = 0.80) and decision tree (AUC = 0.77). Contributing features for the XGBoost model for classifying a patient as at-risk for undiagnosed CD autoimmunity included signs of anemia, transaminitis and decreased high-density lipoprotein. This model's ability to distinguish cases of incident CD autoimmunity from controls shows promise as a potential clinical tool to identify patients with increased risk of having undiagnosed celiac disease in the community, for serologic screening.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Autoantibodies / blood
  • Autoantibodies / immunology
  • Autoimmunity*
  • Case-Control Studies
  • Celiac Disease* / blood
  • Celiac Disease* / diagnosis
  • Celiac Disease* / immunology
  • Child
  • Early Diagnosis
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Protein Glutamine gamma Glutamyltransferase 2
  • Transglutaminases / immunology
  • Young Adult

Substances

  • Transglutaminases
  • Autoantibodies
  • Protein Glutamine gamma Glutamyltransferase 2