Though introduced nearly 50 years ago, the infinitesimal jackknife (IJ) remains a popular modern tool for quantifying predictive uncertainty in complex estimation settings. In particular, when supervised learning ensembles are constructed via bootstrap samples, recent work demonstrated that the IJ estimate of variance is particularly convenient and useful. However, despite the algebraic simplicity of its final form, its derivation is rather complex. As a result, studies clarifying the intuition behind the estimator or rigorously investigating its properties have been severely lacking. This work aims to take a step forward on both fronts. We demonstrate that surprisingly, the exact form of the IJ estimator can be obtained via a straightforward linear regression of the individual bootstrap estimates on their respective weights or via the classical jackknife. The latter realization is particularly useful as it allows us to formally investigate the bias of the IJ variance estimator and better characterize the settings in which its use is appropriate. Finally, we extend these results to the case of U-statistics where base models are constructed via subsampling rather than bootstrapping and provide a consistent estimate of the resulting variance.
翻译:尽管近50年前引入了无限的顶尖刀(IJ),但它仍然是在复杂的估计环境中量化预测不确定性的流行现代工具。特别是,当监督的学习集合通过靴子样本建立时,最近的工作表明,IJ对差异的估算特别方便和有用。然而,尽管其最终形式的代数简单,但其衍生相当复杂。因此,对测量器背后的直觉或严格调查其属性的研究严重缺乏。这项工作旨在两条战线上向前迈出一步。我们令人惊讶地证明,通过对单个靴子陷阱对其各自重量的估计进行直线回归或通过典型的顶尖刀,可以取得IJ估计的准确形式。后一种认识特别有用,因为它使我们能够正式调查IJ差异估计器的偏差,并更好地描述其适当使用环境。最后,我们将这些结果推广到基础模型通过子取样而不是靴子取样构建的U-统计学案例,并对由此产生的差异作出一致的估计。