Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.
翻译:交叉校验(CV)是统计学习中最广泛使用的用来估计模型测试错误的技术之一,但其行为尚未完全被理解。已经表明,使用CV估计值进行测试错误的标准置信度间隔的覆盖率可能低于名义水平。这种现象之所以发生,是因为在CV期间,每个样本都用于培训和测试程序,结果,错误的CV估计值就变得相关。如果不考虑这一相关性,差异估计值就小于应有的值。减轻这一问题的一个办法是估计预测错误的平均平方差,而不是使用嵌巢式CV。这种方法已经表明,与标准CV得出的间隔相比,这种方法的覆盖率更高。 在这项工作中,我们将嵌套式CV概念推广到Cox比例危害模型中,并探讨这一环境的各种测试错误选择。