Evaluating forecasts is essential in order to understand and improve forecasting and make forecasts useful to decision-makers. Much theoretical work has been done on the development of proper scoring rules and other scoring metrics that can help evaluate forecasts. In practice, however, conducting a forecast evaluation and comparison of different forecasters remains challenging. In this paper we introduce scoringutils, an R package that aims to greatly facilitate this process. It is especially geared towards comparing multiple forecasters, regardless of how forecasts were created, and visualising results. The package is able to handle missing forecasts and is the first R package to offer extensive support for forecasts represented through predictive quantiles, a format used by several collaborative ensemble forecasting efforts. The paper gives a short introduction to forecast evaluation, discusses the metrics implemented in scoringutils and gives guidance on when they are appropriate to use, and illustrates the application of the package using example data of forecasts for COVID-19 cases and deaths submitted to the European Forecast Hub between May and September 2021
翻译:评估预测对于了解和改进预测以及使预测对决策者有用至关重要。在制订适当的评分规则和其他评分指标方面,已经做了许多理论工作,有助于评估预测。但在实践中,对不同预测者进行预测评估和比较仍然具有挑战性。在本文件中,我们引入了旨在大大促进这一进程的R套套件“评分符”,目的是极大地促进这一进程。它特别着眼于比较多个预测者,而不论预测是如何产生的,以及可视结果。这个套件能够处理缺失的预测,并且是第一个R套件,为通过预测数量表示的预测提供广泛的支持,这是若干合作联合预测工作采用的一种格式。本文对预测评价作了简短介绍,讨论了在评分中执行的衡量标准,并在适当时提供指导,并用COVID-19案件预测数据和2021年5月至9月向欧洲预测中心提交的死亡预测数据样本,说明该套件的应用情况。