Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for risk monotonization based on cross-validation that takes as input a generic prediction procedure and returns a modified procedure whose out-of-sample prediction risk is, asymptotically, monotonic in the limiting aspect ratio. As part of our framework, we propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting, respectively, and show that, under very mild assumptions, they provably achieve monotonic asymptotic risk behavior. Our results are applicable to a broad variety of prediction procedures and loss functions, and do not require a well-specified (parametric) model. We exemplify our framework with concrete analyses of the minimum $\ell_2$, $\ell_1$-norm least squares prediction procedures. As one of the ingredients in our analysis, we also derive novel additive and multiplicative forms of oracle risk inequalities for split cross-validation that are of independent interest.
翻译:最近对几个常用的预测程序进行的实证和理论分析显示,在高维层面,即双倍/多位下降,存在着一种奇特的风险行为,在高维层面,无症状风险是一种非单向性功能,即特征或参数数量与抽样规模之间的有限比例。为了减轻这种不可取的行为,我们制定了一个基于交叉验证的风险单一化总体框架,这一框架采用一种通用预测程序作为投入,并返回一个经过修改的程序,其超出抽样的预测风险在限制方面的比例中几乎是单向性的。作为我们框架的一部分,我们提出了两种数据驱动方法,即零步和一步方法,分别与包装和提升特征或参数数量之比相近,并表明,在非常温和的假设下,它们可以实现单一的单向性单向性风险行为。我们的结果适用于广泛的预测程序和损失功能,并不要求一个精确的(参数)模型。我们以具体分析价格-2美元、美元-利-一步方法,我们提出了两种数据-一步方法,分别与加价-标准-风险分析中我们的一个新版本-标准-标准-风险分析,也是我们的一种新-标准-标准-变的模型-风险-分析。