Background: Identifying language disorders earlier can help children receive the support needed to improve developmental outcomes and quality of life. Despite the prevalence and impacts of persistent language disorder, there are surprisingly no robust predictor tools available. This makes it difficult for researchers to recruit young children into early intervention trials, which in turn impedes advances in providing effective early interventions to children who need it.
Aims: To validate externally a predictor set of six variables previously identified to be predictive of language at 11 years of age, using data from the Longitudinal Study of Australian Children (LSAC) birth cohort. Also, to examine whether additional LSAC variables arose as predictive of language outcome.
Methods & procedures: A total of 5107 children were recruited to LSAC with developmental measures collected from 0 to 3 years. At 11-12 years, children completed the Clinical Evaluation of Language Fundamentals, 4th Edition, Recalling Sentences subtest. We used SuperLearner to estimate the accuracy of six previously identified parent-reported variables from ages 2-3 years in predicting low language (sentence recall score ≥ 1.5 SD below the mean) at 11-12 years. Random forests were used to identify any additional variables predictive of language outcome.
Outcomes & results: Complete data were available for 523 participants (52.20% girls), 27 (5.16%) of whom had a low language score. The six predictors yielded fair accuracy: 78% sensitivity (95% confidence interval (CI) = [58, 91]) and 71% specificity (95% CI = [67, 75]). These predictors relate to sentence complexity, vocabulary and behaviour. The random forests analysis identified similar predictors.
Conclusions & implications: We identified an ultra-short set of variables that predicts 11-12-year language outcome with 'fair' accuracy. In one of few replication studies of this scale in the field, these methods have now been conducted across two population-based cohorts, with consistent results. An imminent practical implication of these findings is using these predictors to aid recruitment into early language intervention studies. Future research can continue to refine the accuracy of early predictors to work towards earlier identification in a clinical context.
What this paper adds: What is already known on the subject There are no robust predictor sets of child language disorder despite its prevalence and far-reaching impacts. A previous study identified six variables collected at age 2-3 years that predicted 11-12-year language with 75% sensitivity and 81% specificity, which warranted replication in a separate cohort. What this study adds to the existing knowledge We used machine learning methods to identify a set of six questions asked at age 2-3 years with ≥ 71% sensitivity and specificity for predicting low language outcome at 11-12 years, now showing consistent results across two large-scale population-based cohort studies. What are the potential or clinical implications of this work? This predictor set is more accurate than existing feasible methods and can be translated into a low-resource and time-efficient recruitment tool for early language intervention studies, leading to improved clinical service provision for young children likely to have persisting language difficulties.
Keywords: SuperLearner; language disorders; longitudinal studies; machine learning; random forests; sensitivity and specificity.
© 2024 The Author(s). International Journal of Language & Communication Disorders published by John Wiley & Sons Ltd on behalf of Royal College of Speech and Language Therapists.