观测不完善的预测得分分布 (Forecast score distributions with imperfect observations)

The classical paradigm of scoring rules is to discriminate between two different forecasts by comparing them with observations. The probability distribution of the observed record is assumed to be perfect as a verification benchmark. In practice, however, observations are almost always tainted by errors and uncertainties. If the yardstick used to compare forecasts is imprecise, one can wonder whether such types of errors may or may not have a strong influence on decisions based on classical scoring rules. We propose a new scoring rule scheme in the context of models that incorporate errors of the verification data. We rely on existing scoring rules and incorporate uncertainty and error of the verification data through a hidden variable and the conditional expectation of scores when they are viewed as a random variable. The proposed scoring framework is compared to scores used in practice, and is expressed in various setups, mainly an additive Gaussian noise model and a multiplicative Gamma noise model. By considering scores as random variables one can access the entire range of their distribution. In particular we illustrate that the commonly used mean score can be a misleading representative of the distribution when this latter is highly skewed or have heavy tails. In a simulation study, through the power of a statistical test and the computation of Wasserstein distances between scores distributions, we demonstrate the ability of the newly proposed score to better discriminate between forecasts when verification data are subject to uncertainty compared with the scores used in practice. Finally, we illustrate the benefit of accounting for the uncertainty of the verification data into the scoring procedure on a dataset of surface wind speed from measurements and numerical model outputs.

翻译：典型的评分规则范式是将两种不同的预测与观测进行比较,从而区分两种不同的预测。观察到的记录的概率分布假定是完美的,作为核查基准。但在实践中,观测几乎总是有错误和不确定因素的污点。如果用来比较预测的尺度不准确,人们会怀疑这类错误类型是否会对基于古典评分规则的决定产生强烈影响。我们提议在模型中采用新的评分规则,其中含有核查数据错误。我们依靠现有的评分规则,并将核查数据的不确定性和错误通过隐藏变量和有条件的评分预期(当它们被视为随机变量时)纳入。在实践中,提议的评分框架几乎总是被错误和不确定因素所覆盖。在各种设置中,主要是加加加加高的噪音模型和倍增的伽马噪音模型。通过将评分作为随机变量来考虑其分布的整个范围。我们特别要说明,当后者被高度扭曲或有重尾巴时,通常使用的评分可能是分配的误导性代表。在模拟研究中,通过统计测试的力量和计算卡列的比值能力,我们最后的评分数据在计算中显示,我们使用的比标的比前期数据的比值的比值数据在最后的比值中显示,我们使用的比值的比值的比值的比值的比值数据。