Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population. We further show that this phenomenon occurs for most popular estimates of prediction error, including data splitting, bootstrapping, and Mallow's Cp. Next, the standard confidence intervals for prediction error derived from cross-validation may have coverage far below the desired level. Because each data point is used for both training and testing, there are correlations among the measured accuracies for each fold, and so the usual estimate of variance is too small. We introduce a nested cross-validation scheme to estimate this variance more accurately, and we show empirically that this modification leads to intervals with approximately correct coverage in many examples where traditional cross-validation intervals fail.
翻译:交叉校准是用来估计预测误差的一种广泛使用的方法,但其行为是复杂和不完全理解的。 理想的是,人们会认为交叉校准估计手头模型的预测误差,符合培训数据。 我们证明线性模型适合普通最小方格的情况并非如此; 而它估计适合同一人群的其他无形培训组的模型的平均预测误差。 我们进一步表明,这种现象发生在大多数流行的预测误差估计中,包括数据分离、制靴和Mallow's Cp。 下一步, 交叉校准产生的预测误差标准信任期的覆盖面可能远远低于理想水平。 由于每个数据点用于培训和测试,每个折数的测量误差都有关联性,因此通常的差异估计太小。 我们采用了嵌套的交叉校准办法来更准确地估计这一差异,我们从经验上表明,这种修改导致间隔,在传统的交叉校准间隔期失败的许多例子中,这种间隔范围大致正确。