Choosing models from a hypothesis space is a frequent task in approximation theory and inverse problems. Cross-validation is a classical tool in the learner's repertoire to compare the goodness of fit for different reconstruction models. Much work has been dedicated to computing this quantity in a fast manner but tackling its theoretical properties occurs to be difficult. So far, most optimality results are stated in an asymptotic fashion. In this paper we propose a concentration inequality on the difference of cross-validation score and the risk functional with respect to the squared error. This gives a pre-asymptotic bound which holds with high probability. For the assumptions we rely on bounds on the uniform error of the model which allow for a broadly applicable framework. We support our claims by applying this machinery to Shepard's model, where we are able to determine precise constants of the concentration inequality. Numerical experiments in combination with fast algorithms indicate the applicability of our results.
翻译:从假设空间中选择模型是近似理论和反向问题的一个常见任务。 交叉校验是学习者文集中用于比较适合不同重建模型的优劣性的一个古典工具。 许多工作都致力于快速计算这一数量, 但处理其理论特性却十分困难。 到目前为止, 多数最佳性结果都是以无现时方式表述的。 在本文中, 我们建议对交叉校准得分的差异和对正方错误的风险功能进行集中不平等。 这给出了一个预失灵前的界限, 其概率很高。 对于我们所依赖的模型的统一错误的界限, 允许一个广泛适用的框架。 我们支持我们的主张, 将这个机器应用到谢帕德的模型中, 在那里我们能够确定浓度不平等的精确常数。 数字实验与快速算法相结合, 显示了我们结果的可适用性 。