A learning procedure takes as input a dataset and performs inference for the parameters $\theta$ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about $\theta$ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning procedures that return distributional output. This paper studies conditions for a learning procedure to be considered calibrated, in the sense that the true data-generating parameters are plausible as samples from its distributional output. A learning procedure whose inferences and predictions are systematically over- or under-confident will fail to be calibrated. On the other hand, a learning procedure that is calibrated need not be statistically efficient. A hypothesis-testing framework is developed in order to assess, using simulation, whether a learning procedure is calibrated. Several vignettes are presented to illustrate different aspects of the framework.
翻译:学习程序将假设生成数据集的模型的值值当做输入数据集,并对该模型的参数值进行推论。 这里我们考虑的是其产出为概率分布的学习程序,这是在看到数据集后对美元值的不确定性。 巴伊西亚推论是这种程序的一个典型例子,但也可以建立返回分布输出的其他学习程序。 本文研究将被视为校准学习程序的条件, 即真正的数据生成参数作为其分布输出的样本是有道理的。 一个其推论和预测系统过高或不够自信的学习程序将无法校准。 另一方面, 一个经过校准的学习程序不需要具有统计效率。 开发了一个假设测试框架, 以便使用模拟来评估是否校准了学习程序。 有几个 vignette 演示了框架的不同方面 。