Background: We examined a large number of variables to generate new hypotheses regarding a wider range of risk factors for anophthalmia/microphthalmia using data mining.
Methods: Data were from the National Birth Defects Prevention Study, a multicentre, case-control study from 10 centres in the United States. There were 134 cases of "isolated" and 87 "nonisolated" (with other major birth defects) of anophthalmia/microphthalmia and 11 052 nonmalformed controls with delivery dates October 1997-December 2011. Using random forest, a data mining procedure, we compared the two case types with controls for 201 variables. Variables considered important ranked by random forest were included in a multivariable logistic regression model to estimate odds ratios and 95% confidence intervals.
Results: Predictors for isolated cases included paternal race/ethnicity, maternal intake of certain nutrients and foods, and childhood health problems in relatives. Using regression, inverse associations were observed with greater maternal education and with increasing intake of folate and potatoes. Odds were slightly higher with greater paternal education, for increased intake of carbohydrates and beans, and if relatives had a childhood health problem. For nonisolated cases, predictors included paternal race/ethnicity, maternal intake of certain nutrients, and smoking in the home the month before conception. Odds were higher for Hispanic fathers and smoking in the home and NSAID use the month before conception.
Conclusions: Results appear to support previously hypothesised risk factors, socio-economic status, NSAID use, and inadequate folate intake, and potentially provide new areas such as passive smoking pre-pregnancy, and paternal education and ethnicity, to explore for further understanding of anophthalmia/microphthalmia.
Keywords: anophthalmia; birth defects; data mining; microphthalmia; random forest.
© 2018 John Wiley & Sons Ltd.