神经测试台:评价联合预测 (The Neural Testbed: Evaluating Joint Predictions)

Ian Osband,Zheng Wen,Seyed Mohammad Asghari,Vikranth Dwaracherla,Botao Hao,Morteza Ibrahimi,Dieterich Lawson,Xiuyuan Lu,Brendan O'Donoghue,Benjamin Van Roy

Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces \textit{The Neural Testbed}: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a simple neural network data generating process. Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions. We also show that the quality of joint predictions drives performance in downstream decision tasks. We find these results are robust across choice a wide range of generative models, and highlight the practical importance of joint predictions to the community.

翻译：预测性分布可以量化被点估计忽略的不确定性。本文引入了 \ textit{ 神经测试台 : 用于对产生这种预测的物剂进行有控制和有原则的评估的开放源码基准。关键是, 测试性评估的物剂不仅对其每种投入的边际预测质量, 而且还对许多投入的联合预测质量进行评估。我们使用简单的神经网络数据生成程序对一系列物剂进行评估。我们的结果表明, 一些受欢迎的巴伊西亚深层学习剂对联合预测并不满意, 即使它们能够产生准确的边际预测。我们还表明, 联合预测的质量能推动下游决策任务的业绩。我们发现,这些结果在选择的多种基因模型中是稳健的, 并强调联合预测对社区的实际重要性。