Test error estimation is a fundamental problem in statistics and machine learning. Correctly assessing the future performance of an algorithm is an essential task, especially with the development of complex predictive algorithms that require data-driven parameter tuning. We propose a new coupled bootstrap estimator for the test error of Poisson-response algorithms, a fundamental model for count data and with applications such as signal processing, density estimation, and queue theory. The idea behind our estimator is to generate two carefully designed new random vectors from the original data, where one acts as a training sample and the other as a test set. It is unbiased for an intuitive parameter: the out-of-sample error of a Poisson random vector whose mean has been shrunken by a small factor. Moreover, in a limiting regime, the coupled bootstrap estimator recovers an exactly unbiased estimator for test error. Our framework is applicable to loss functions of the Bregman divergence family, and our analysis and examples focus on two important cases: Poisson likelihood deviance and squared loss. Through a bias-variance decomposition, we analyze the effect of the number of bootstrap samples and the added noise due to the two auxiliary variables. We then apply our method to different scenarios with both simulated and real data.
翻译:测试错误估算是统计和机器学习中的一个基本问题。 正确评估算法的未来性能是一项基本任务, 特别是开发复杂的预测算法, 需要数据驱动参数调整。 我们为Poisson- 反应算法的测试错误提出一个新的配对靴带估计仪, 这是计算数据的基本模型, 以及信号处理、 密度估计和队列理论等应用程序。 我们的测算器背后的想法是从原始数据中产生两个精心设计的新的随机矢量, 其中一个作为培训样本, 另一个作为测试集。 对于一个直观参数来说, 它是不带偏见的: 一个 Poisson 随机矢量的抽移错误, 其平均值被一个小因素冲破了。 此外, 在一种限制性的制度中, 配对计算数据测算器的模型恢复了一个完全公正的估计。 我们的框架适用于 Bregman 差异组的损失函数, 以及我们的分析与示例集中在两个重要案例: Poisson 可能性去动和平方损失。 通过一个偏差的随机误差错误, 我们用两个模型来分析我们当时的模型的精确度和模型的模型的变异变。