Concurrent Validity and Feasibility of Short Tests Currently Used to Measure Early Childhood Development in Large Scale Studies

PLoS One. 2016 Aug 22;11(8):e0160962. doi: 10.1371/journal.pone.0160962. eCollection 2016.

Abstract

In low- and middle-income countries (LIMCs), measuring early childhood development (ECD) with standard tests in large scale surveys and evaluations of interventions is difficult and expensive. Multi-dimensional screeners and single-domain tests ('short tests') are frequently used as alternatives. However, their validity in these circumstances is unknown. We examined the feasibility, reliability, and concurrent validity of three multi-dimensional screeners (Ages and Stages Questionnaires (ASQ-3), Denver Developmental Screening Test (Denver-II), Battelle Developmental Inventory screener (BDI-2)) and two single-domain tests (MacArthur-Bates Short-Forms (SFI and SFII), WHO Motor Milestones (WHO-Motor)) in 1,311 children 6-42 months in Bogota, Colombia. The scores were compared with those on the Bayley Scales of Infant and Toddler Development (Bayley-III), taken as the 'gold standard'. The Bayley-III was given at a center by psychologists; whereas the short tests were administered in the home by interviewers, as in a survey setting. Findings indicated good internal validity of all short tests except the ASQ-3. The BDI-2 took long to administer and was expensive, while the single-domain tests were quickest and cheapest and the Denver-II and ASQ-3 were intermediate. Concurrent validity of the multi-dimensional tests' cognitive, language, and fine motor scales with the corresponding Bayley-III scale was low below 19 months. However, it increased with age, becoming moderate-to-high over 30 months. In contrast, gross motor scales' concurrence was high under 19 months and then decreased. Of the single-domain tests, the WHO-Motor had high validity with gross motor under 16 months, and the SFI and SFII expressive scales showed moderate correlations with language under 30 months. Overall, the Denver-II was the most feasible and valid multi-dimensional test and the ASQ-3 performed poorly under 31 months. By domain, gross motor development had the highest concurrence below 19 months, and language above. Predictive validity investigation is needed to further guide the choice of instruments for large scale studies.

MeSH terms

  • Child Development / physiology*
  • Child, Preschool
  • Colombia
  • Developing Countries
  • Female
  • Humans
  • Infant
  • Language Tests / standards*
  • Male
  • Motor Skills / physiology*
  • Neuropsychological Tests / standards*
  • Poverty
  • Psychometrics / methods*
  • Psychomotor Performance / physiology*

Grants and funding

Data collection was funded by Fund RG-T1907 from the Inter-American Development Bank (IDB). Rubio-Codina’s research time was partly financed by the Leverhulme Trust Early Career Fellowship ECF/2008/0170. Attanasio’s research time was partially financed by the European Research Council (ERC) Advanced Grants 249612 and the Economic and Social Research Council (ESRC) Professorial Fellowship ES/K010700/1. The funder (IDB) provided support in the form of salaries for authors (MRC, MCA), but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the IDB, its Board of Directors, or the countries they represent.