Recreational water quality is currently monitored at Sandpoint Beach on Lake St. Clair using culture-based enumeration of Escherichia coli. Using water quality and weather data collected over 4 yr, several multiple linear regression (MLR)-based models were developed for near real-time prediction of E. coli concentration and were tested using independent data from the fifth year. Model performance was assessed by the determination of metrics such as RMSE, accuracy, specificity, sensitivity, and area under the receiver operating characteristic curve (AUROC). Each of the developed MLR models described herein resulted in increased correct responses for both exceedance and non-exceedance of the applicable standard as compared to predictions based on E. coli measurements (persistence models, using the previous day's E. coli concentration), which is the method currently being used. The AUROC values for persistence models are between 0.5 and 0.6, as compared to >0.7 for all the MLR models described herein. Among the MLR models, model performance improved when qualitative sky weather condition, which is commonly reported but was not previously used in similar models, was included. To select the best model, a principal coordinate analysis was used to combine multiple model performance metrics and provide a more sensitive tool for model comparison. Although models developed using 2, 3, and 4 yr of monitoring data provided reasonable performance, the model developed using the most recent 2-yr data was marginally better. Thus, data from the most recent 2 yr are likely sufficient as a training dataset for updating the MLR model for Sandpoint Beach in the future.
© 2020 The Authors. Journal of Environmental Quality © 2020 American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America.