Posterior predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an opensource library for controlled and principled evaluation of agents that generate such predictions. Crucially, agents are assessed not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a neural-network-based data generating process. Our results indicate that some well-known agents that produce accurate marginal predictions do not fare well with joint predictions. We show that the quality of joint predictions drives performance in downstream decision tasks, and highlight the importance of this observation to the community.
翻译:本文介绍“神经测试:一个用于对产生这种预测的物剂进行有节制和有原则的评估的开放源码图书馆”。 关键是,不仅对其投入的边际预测质量进行评估,而且对其在许多投入方面的联合预测进行评估。 我们利用神经网络数据生成过程对一系列物剂进行评估。 我们的结果表明,一些有名的物剂,如作出准确的边际预测,与联合预测不相适应。我们表明,联合预测的质量能推动下游决策任务的业绩,并突出这一观察对社区的重要性。