Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' Fβ-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.
Keywords: children’s health; drinking water; lead; machine learning; risk assessment.