The objective was to identify individuals with undiagnosed prediabetes from administrative data using adaptive techniques. The data source was a national Medicare Advantage Prescription Drug (MAPD) plan administrative data set. A retrospective, cross-sectional study developed and evaluated data adaptive logistic regression, decision tree, neural network, and ensemble predictive models for metabolic syndrome and prediabetes using 3 mutually exclusive cohorts (N = 279,903). The misclassification rate (MCR), average squared error (ASE), c-statistics, sensitivity (SN), and false positive (FP) rates were compared to select the final predictive models. MAPD individuals with continuous enrollment from 2013 to 2014 were included. Metabolic syndrome and prediabetes were defined using clinical guidelines, diagnosis, and laboratory data. A total of 512 variables identified through subject matter expertise in addition to utilizing all data available were evaluated for the modeling. The ensemble model demonstrated better discrimination (c-statistics, MCR, and ASE of 0.83, 0.24, and 0.16, respectively), high SN, and low FP rate in predicting metabolic syndrome than the individual data adaptive modeling techniques. Logistic regression demonstrated better discrimination (c-statistics, MCR, and ASE of 0.67, 0.13, and 0.11 respectively), high SN, and low FP rate in predicting prediabetes than the other adaptive modeling techniques or ensemble methods. The scored data predicted prediabetes in 44% of the MAPD population, which is comparable to 2005-2006 National Health and Nutrition Examination Survey prediabetes rates of 41%. The logistic regression model demonstrated good performance in predicting undiagnosed prediabetes in MAPD individuals.
Keywords: ensemble method; metabolic syndrome; prediabetes.