Machine learning analysis with population data for prepregnancy and perinatal risk factors for the neurodevelopmental delay of offspring

Sci Rep. 2024 Jun 18;14(1):13993. doi: 10.1038/s41598-024-64590-8.

Abstract

Neurodevelopmental disorders (NDD) in offspring are associated with a complex combination of pre-and postnatal factors. This study uses machine learning and population data to evaluate the association between prepregnancy or perinatal risk factors and the NDD of offspring. Population-based retrospective cohort data were obtained from Korea National Health Insurance Service claims data for 209,424 singleton offspring and their mothers who gave birth for the first time in 2007. The dependent variables were motor development disorder (MDD), cognitive development disorder (CDD) and combined overall neurodevelopmental disorder (NDD) from offspring. Seventeen independent variables from 2002 to 2007 were included. Random forest variable importance and Shapley Additive Explanation (SHAP) values were calculated to analyze the directions of its associations with the predictors. The random forest with oversampling registered much higher areas under the receiver-operating-characteristic curves than the logistic regression of interaction and non-linearity terms, 79% versus 50% (MDD), 82% versus 52% (CDD) and 74% versus 50% (NDD). Based on random forest variable importance, low socioeconomic status and age at birth were highly ranked. In SHAP values, there was a positive association between NDD and pre- or perinatal outcomes, especially, fetal male sex with growth restriction associated the development of NDD in offspring.

MeSH terms

  • Adult
  • Child
  • Child, Preschool
  • Female
  • Humans
  • Infant, Newborn
  • Machine Learning*
  • Male
  • Neurodevelopmental Disorders* / epidemiology
  • Neurodevelopmental Disorders* / etiology
  • Pregnancy
  • Republic of Korea / epidemiology
  • Retrospective Studies
  • Risk Factors