Good generalization performance on high-dimensional data crucially hinges on a simple structure of the ground truth and a corresponding strong inductive bias of the estimator. Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: While a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise. Specifically, for both linear regression and classification with a sparse ground truth, we prove that minimum $\ell_p$-norm and maximum $\ell_p$-margin interpolators achieve fast polynomial rates close to order $1/n$ for $p > 1$ compared to a logarithmic rate for $p = 1$. Finally, we provide preliminary experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice.
翻译:高维数据的良好概括性表现关键取决于地面真相的简单结构以及测量者相应的强烈感化偏差。 尽管这种直觉对正规化模型有效,但在本文件中,我们告诫不要在噪音出现时对内推法有强烈的暗示偏差:虽然一种更强烈的暗示偏差鼓励一种更符合地面真相的更简单的结构,但它也增加了噪音的有害影响。具体地说,对于线性回归和地面真相稀少的分类而言,我们证明,最小的美元/日-诺尔和最高的美元/日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-月-日-日-日-日-日-日-日-日-月-日-日-日-日-日-日-日-日-日-月-日-日-日-日-日-日-日-日-日-日-日-日-日-月-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-月-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-月-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日-日