First, we analyze the variance of the Cross Validation (CV)-based estimators used for estimating the performance of classification rules. Second, we propose a novel estimator to estimate this variance using the Influence Function (IF) approach that had been used previously very successfully to estimate the variance of the bootstrap-based estimators. The motivation for this research is that, as the best of our knowledge, the literature lacks a rigorous method for estimating the variance of the CV-based estimators. What is available is a set of ad-hoc procedures that have no mathematical foundation since they ignore the covariance structure among dependent random variables. The conducted experiments show that the IF proposed method has small RMS error with some bias. However, surprisingly, the ad-hoc methods still work better than the IF-based method. Unfortunately, this is due to the lack of enough smoothness if compared to the bootstrap estimator. This opens the research for three points: (1) more comprehensive simulation study to clarify when the IF method win or loose; (2) more mathematical analysis to figure out why the ad-hoc methods work well; and (3) more mathematical treatment to figure out the connection between the appropriate amount of "smoothness" and decreasing the bias of the IF method.
翻译:首先,我们分析跨度校验(CV)依据的估测标准的差异。第二,我们提出一个新的估计标准,用以前非常成功地用来估计基于靴的测算器差异的“影响函数(IF)”方法来估计这种差异。首先,我们分析基于跨度校验(CV)的估测标准的差异。第二,我们提出一个新的估计标准,以利用以前非常成功地用来估计基于靴的测算器差异的“影响函数(IF)”方法来估计这种差异。研究的动机是,据我们所知,文献缺乏一种严格的方法来估计基于基于CV(CV)的估测仪的差异。可用的是一套没有数学基础的特设程序,因为它们忽视了依赖性随机变量之间的差异结构。我们进行的实验表明,IFFS建议的方法有小的“RMS”错误,带有某些偏差。然而,令人惊讶的是,基于靴的测算器仍然比基于IFP的方法工作得更好。不幸的是,这是由于与“测算器”的测算器相比缺乏足够的顺畅通性。这为研究打开了三点:(1) :(1)更全面的模拟研究,以澄清IFFFS方法何时获胜或松;(2)更数学分析,以说明为什么“方法与IFFFFFFFFFS的偏差的程度越来越小。