Inference from limited data requires a notion of measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. Jeffreys prior is the best-known uninformative choice, the invariant volume element from information geometry, but we demonstrate here that this leads to enormous bias in typical high-dimensional models. This is because models found in science typically have an effective dimensionality of accessible behaviours much smaller than the number of microscopic parameters. Any measure which treats all of these parameters equally is far from uniform when projected onto the sub-space of relevant parameters, due to variations in the local co-volume of irrelevant directions. We present results on a principled choice of measure which avoids this issue, and leads to unbiased posteriors, by focusing on relevant parameters. This optimal prior depends on the quantity of data to be gathered, and approaches Jeffreys prior in the asymptotic limit. But for typical models this limit cannot be justified without an impossibly large increase in the quantity of data, exponential in the number of microscopic parameters.
翻译:有限数据的推论要求有一个参数空间的测量概念, 这一点在Bayesian 框架中作为先前的分布方式最为明确。 Jeffreys 前面是最著名的非信息化选择, 即信息几何的变量体积元素, 但我们在这里证明这导致典型的高维模型的巨大偏差。 这是因为在科学中发现的模型通常具有可获取行为的有效维度, 远小于微粒参数的数量。 任何同等对待所有这些参数的尺度在预测到相关参数的子空间时都远不统一, 因为本地共量的不相关方向的变化。 我们提出了原则性选择衡量方法的结果, 避免了这一问题, 并导致以相关参数为焦点的不带偏见的后方。 这在最理想之前取决于要收集的数据数量和在无药限制之前的杰弗里方法。 但是对于典型的模型来说, 如果数据数量没有不可避免的大幅增长, 以微谱参数的数量为指数, 就无法证明这一限制是有道理的。