基于贝叶斯留一交叉验证模型比较中的不确定性 (Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison)

It is useful to estimate the expected predictive performance of models planned to be used for prediction. We focus on leave-one-out cross-validation (LOO-CV), which has become a popular method for estimating predictive performance of Bayesian models. Given two models, we are interested in comparing the predictive performances and associated uncertainty, which can also be used to compute the probability of one model having better predictive performance than the other model. We study the properties of the Bayesian LOO-CV estimator and the related uncertainty quantification for the predictive performance difference, and analyse when a normal approximation of this uncertainty is well calibrated and whether taking into account higher moments could improve the approximation. We provide new results of the properties both theoretically in the linear regression case and empirically for hierarchical linear, latent linear, and spline models and discuss the challenges. We show that problematic cases include: comparing models with similar predictions, misspecified models, and small data. In these cases, there is a weak connection between the distributions of the LOO-CV estimator and its error. We show that that the problematic skewness of the error distribution for the difference, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations. Based on the results, we also provide some practical recommendations for the users of Bayesian LOO-CV for comparing predictive performance of models.

翻译：估计计划用于预测的模型的预期预测性能具有重要意义。本文聚焦于留一交叉验证（LOO-CV）方法，该方法已成为评估贝叶斯模型预测性能的常用技术。针对两个模型，我们关注其预测性能的比较及相关不确定性，这种不确定性也可用于计算一个模型比另一个模型具有更好预测性能的概率。我们研究了贝叶斯LOO-CV估计量的性质及其对预测性能差异的不确定性量化方法，分析了该不确定性的正态近似何时具有良好校准性，以及考虑更高阶矩是否能够改进近似效果。我们在线性回归情形下从理论上、在分层线性模型、潜在线性模型和样条模型中通过实证研究，提供了关于这些性质的新结果并讨论了相关挑战。研究表明存在问题的情形包括：比较预测结果相似的模型、错误设定模型以及小样本数据。在这些情况下，LOO-CV估计量的分布与其误差分布之间存在弱关联。我们证明当模型预测结果相似时，差异误差分布存在的有偏性问题在某些情况下不会随数据量趋于无穷而消失。基于研究结果，我们为使用贝叶斯LOO-CV进行模型预测性能比较的研究者提供了若干实践建议。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日