Cross-validation (CV) is one of the most popular tools for assessing and selecting predictive models. However, standard CV suffers from high computational cost when the number of folds is large. Recently, under the empirical risk minimization (ERM) framework, a line of works proposed efficient methods to approximate CV based on the solution of the ERM problem trained on the full dataset. However, in large-scale problems, it can be hard to obtain the exact solution of the ERM problem, either due to limited computational resources or due to early stopping as a way of preventing overfitting. In this paper, we propose a new paradigm to efficiently approximate CV when the ERM problem is solved via an iterative first-order algorithm, without running until convergence. Our new method extends existing guarantees for CV approximation to hold along the whole trajectory of the algorithm, including at convergence, thus generalizing existing CV approximation methods. Finally, we illustrate the accuracy and computational efficiency of our method through a range of empirical studies.
翻译:交叉验证(CV)是评估和选择预测模型最受欢迎的工具之一,然而,标准的CV在折叠数量大的情况下,具有很高的计算成本。最近,根据实验风险最小化(ERM)框架,根据在全数据集方面受过培训的机构风险管理问题解决方案,提出了一套拟议的有效方法,以近似CV。然而,在大规模问题中,由于计算资源有限或由于防止过度使用而及早停止,很难找到机构风险管理问题的确切解决办法。在本文件中,我们提出了一个新模式,在机构风险管理问题通过迭接第一阶算法解决时,可以有效地接近CV,但不会一直延续到趋同。我们的新方法扩大了现有的CV近似保证,以保持整个算法的轨迹,包括趋同,从而推广现有的CV近似法。最后,我们通过一系列经验研究来说明我们方法的准确性和计算效率。</s>