We derive sanity-check bounds for the cross-validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme or rare events. We consider classification on extreme regions of the covariate space, a problem analyzed in Jalalzai et al. 2018. The risk is then a probability of error conditional to the norm of the covariate vector exceeding a high quantile. Establishing sanity-check bounds consist in recovering bounds regarding the CV estimate that are of the same nature as the ones regarding the empirical risk. We achieve this goal both for K-fold CV with an exponential bound and for leave-p-out CV with a polynomial bound, thus extending the state-of-the-art results to the modified version of the risk which is adapted to extreme value analysis.
翻译:我们对用于极端或罕见事件的学习算法(CV)的通用风险进行交叉校准估计时,得出理智度检查界限。我们考虑对共变空间的极端区域进行分类,这是Jalalzai等人在2018年分析的一个问题。然后,风险是取决于共变矢量超过高孔数的规范的误差概率。建立理智度检查界限包括收回CV估计值的界限,这些界限与经验风险的误差性质相同。我们实现了K倍CV的指数约束和多数值约束的离场CV的分级目标,从而将最新结果扩大到适应极端价值分析的经修改的风险版本。