Proper scoring rules are commonly applied to quantify the accuracy of distribution forecasts. Given an observation they assign a scalar score to each distribution forecast, with the the lowest expected score attributed to the true distribution. The energy and variogram scores are two rules that have recently gained some popularity in multivariate settings because their computation does not require a forecast to have parametric density function and so they are broadly applicable. Here we conduct a simulation study to compare the discrimination ability between the energy score and three variogram scores. Compared with other studies, our simulation design is more realistic because it is supported by a historical data set containing commodity prices, currencies and interest rates, and our data generating processes include a diverse selection of models with different marginal distributions, dependence structure, and calibration windows. This facilitates a comprehensive comparison of the performance of proper scoring rules in different settings. To compare the scores we use three metrics: the mean relative score, error rate and a generalised discrimination heuristic. Overall, we find that the variogram score with parameter p=0.5 outperforms the energy score and the other two variogram scores.
翻译:适当的评分规则通常用于量化分配预测的准确性。 根据观察,他们为每个分配预测指定了一个标分, 其预期得分最低可归因于真实分布。 能量和方位计分是两个规则, 最近在多变量设置中赢得了某种普及, 因为它们的计算并不要求预测具有参数密度功能, 因而可以广泛适用。 我们在这里进行模拟研究, 比较能量评分和三个变差计之间的差别能力。 与其他研究相比, 我们的模拟设计更为现实, 因为它得到包含商品价格、 货币和利率的历史数据集的支持, 我们的数据生成过程包括不同边际分布、 依赖结构和校准窗口的不同模型选择。 这有助于全面比较不同环境的适当评分规则的性能。 为了比较分数, 我们使用三个指标: 平均相对评分、 误差率率和泛化的偏差。 总的来说, 我们发现, 以参数 p=0.5 参数为参数的变数比能量评分和另外两个变差分。