Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts a numerical score such that a correct forecast achieves a minimal expected score. In this paper, we construct e-values for testing the statistical significance of score differences of competing forecasts in sequential settings. E-values have been proposed as an alternative to p-values for hypothesis testing, and they can easily be transformed into conservative p-values by taking the multiplicative inverse. The e-values proposed in this article are valid in finite samples without any assumptions on the data generating processes. They also allow optional stopping, so a forecast user may decide to interrupt evaluation taking into account the available data at any time and still draw statistically valid inference, which is generally not true for classical p-value based tests. In a case study on postprocessing of precipitation forecasts, state-of-the-art forecasts dominance tests and e-values lead to the same conclusions.
翻译:对二进制事件的预测在许多应用中发挥着核心作用。对二进制事件预测的质量通常以适当的评分规则来评估,这些评分的预测数字得分的准确预测达到最低的预期得分。在本文中,我们构建电子价值,以测试相竞预测在相继环境下的得分差异的统计意义。提出了电子价值,作为假设测试的P值的替代物,它们很容易通过采用多复制性反演而转化为保守的p值。本条提出的电子价值,在不假定数据生成过程的有限样本中是有效的。它们还允许选择性停用,因此预测用户可以决定中断评估,同时考虑任何时间现有的数据,并仍然在统计上进行有效的推论,对于传统的基于P价值的测试来说,这种推论通常并不适用。在关于降水预测后处理的案例研究中,最先进的预测优势测试和电子价值导致相同的结论。