In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.
翻译:为了确定专业知识,预报人不应该用他们的校准分来测试,而校准分总是可以任意地变小,而应该由他们的Brier分数来测试。Brier分数是校准分数和精细分数的总和;后者衡量用同一预测分入垃圾箱的好坏,从而证明“专门知识”。 这就提出了一个人能否在不失去专业知识的情况下获得校准,而我们称之为“击剑”的问题。 我们提供了一个简单的方法,通过一种确定性在线程序来对任何预报进行评分。 此外,我们表明,通过一个自己校准的随机程序,可以实现打球,然后将结果扩大到同时击碎多种程序,以及不断校准的确定性程序。