Bad and Good Errors: Value-Weighted Skill Scores in Deep Ensemble Learning

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1993-2002. doi: 10.1109/TNNLS.2022.3186068. Epub 2024 Feb 5.

Abstract

Forecast verification is a crucial task for assessing the predictive power of prognostic model forecasts and it is usually implemented by checking quality-based skill scores. In this article, we propose a novel approach to realize forecast verification focusing not just on the forecast quality but rather on its value. Specifically, we introduce a strategy for assessing the severity of forecast errors based on the evidence that, on the one hand, a false alarm just anticipating an occurring event is better than one in the middle of consecutive nonoccurring events, and that, on the other hand, a miss of an isolated event has a worse impact than a miss of a single event, which is part of several consecutive occurrences. Relying on this idea, we introduce a notion of value-weighted skill scores giving greater importance to the value of the prediction rather than to its quality. Then, we introduce an ensemble strategy to maximize quality-based and value-weighted skill scores independently of one another. We test it on the predictions provided by deep learning methods for binary classification in the case of four applications concerned with pollution, space weather, stock price, and IoT data stream forecasting. Our experimental studies show that using the ensemble strategy for maximizing the value-weighted skill scores generally improves both the value and quality of the forecast.