While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i.e. the predictive mean and variance at individual input locations. But it is often more useful to estimate predictive correlations between the function values at different input locations. In this paper, we consider the problem of benchmarking how accurately Bayesian models can estimate predictive correlations. We first consider a downstream task which depends on posterior predictive correlations: transductive active learning (TAL). We find that TAL makes better use of models' uncertainty estimates than ordinary active learning, and recommend this as a benchmark for evaluating Bayesian models. Since TAL is too expensive and indirect to guide development of algorithms, we introduce two metrics which more directly evaluate the predictive correlations and which can be computed efficiently: meta-correlations (i.e. the correlations between the models correlation estimates and the true values), and cross-normalized likelihoods (XLL). We validate these metrics by demonstrating their consistency with TAL performance and obtain insights about the relative performance of current Bayesian neural net and Gaussian process models.
翻译:虽然不确定性估算是深思熟虑的一个很好研究的专题,但大多数此类工作都侧重于边际不确定性估算,即单个输入地点的预测平均值和差异。但估算不同输入地点的函数值之间的预测相关性往往更为有用。在本文件中,我们考虑了基准问题,即巴伊西亚模型如何准确估算预测相关性。我们首先考虑一项下游任务,它取决于后游预测相关性:转动积极学习(TAL)。我们发现TAL比普通积极学习更好地使用模型的不确定性估算值,并建议将其作为评估巴伊西亚模型的基准。由于TAL太昂贵和间接,无法指导算法的开发。我们引入了两种指标,更直接评估预测相关性,并且可以有效计算:元-二次关系(即模型相关性估计与真实值之间的关系)和跨常识性可能性(XLL)。我们通过展示这些指标与TAL的性能的一致性,并了解当前贝伊斯神经网和高斯模型的相对性能,从而验证这些指标。