Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Riley E Mulhern; A J Kondash; Ed Norman; Joseph Johnson; Keith Levine; Andrea McWilliams; Melanie Napier; Frank Weber; Laurie Stella; Erica Wood; Crystal Lee Pow Jackson; Sarah Colley; Jamie Cajka; Jacqueline MacDonald Gibson; Jennifer Hoponick Redmon

doi:10.1021/acs.est.2c07477

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Environ Sci Technol. 2023 Nov 21;57(46):17959-17970. doi: 10.1021/acs.est.2c07477. Epub 2023 Mar 18.

Authors

Affiliations

¹ RTI International, Research Triangle Park, North Carolina 27709, United States.
² Environmental Health Section, Division of Public Health, North Carolina Department of Health and Human Services, Raleigh, North Carolina 27609, United States.
³ Department of Civil, Construction, and Environmental Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States.

Abstract

Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' F_β-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.

Keywords: children’s health; drinking water; lead; machine learning; risk assessment.

MeSH terms

Bayes Theorem
Child
Child Care
Decision Making
Drinking Water*
Humans
Lead* / analysis
Water

Substances

Lead
Water
Drinking Water