An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators -- the ones that achieve zero training error -- may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$-norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size). We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum $\ell_1$-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.
翻译:不断演化的机器学习工作线观察了经验证据,这些证据表明,测算器的内插作用可能不一定有害。本文追求对重要类型的内插器的理论理解:最小值$ell ⁇ 1}美元-北调内插器,其动机是观察到数种学习算法偏向于超参数化制度中低值$1美元-1美元-北调解决方案。具体地说,我们认为高斯设计下的杂乱的稀释回归模型增加了模型能力,侧重于线性宽度和高维度失常性(因此,特征的数量和宽度水平与抽样规模成比例。本文追求对重要类型的内插器的理论理解:最小值$ell_1美元-北调内插器,其严格的理论解释,其原因是,在多参数化制度中,最小值$1美元-北调器最低值的概括性风险和一等增强模型能力。这一现象源于最小值的直位方程式的特殊结构,其非地平方平方平方平面规模是两度的不精确的地平面测量,因此,其基础性对地平方平方平方平方平方平比是两度的对地对地测量。