We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple overparameterized linear regression $y = X \theta + w$ with random design $X \in \mathbb{R}^{n \times d}$ under the proportional asymptotics $d/n \to \gamma \in (1, \infty)$. We precisely characterize how prediction (test) error necessarily scales with training error in this setting. An implication of this characterization is that as the label noise variance $\sigma^2 \to 0$, any estimator that incurs at least $\mathsf{c}\sigma^4$ training error for some constant $\mathsf{c}$ is necessarily suboptimal and will suffer growth in excess prediction error at least linear in the training error. Thus, optimal performance requires fitting training data to substantially higher accuracy than the inherent noise floor of the problem.
翻译:我们研究在超分度模型中进行内插的必要性,也就是说,在机器学习问题实现最佳预测风险时,必须(近些时候)对培训数据进行内插。特别是,我们考虑在比例性低位时,在比例性低位/升/升/升/升/升/升/升/升/升/升/升(1,, = y, = X\theta + w$, 随机设计为 $X\ in mathbb{R ⁇ n\ time d} 美元的情况下,在比例性低位/升/升/升 (1, \ infty) $(, = = y) = X\theta + w$, 随机设计为 $X\ mathb{ { { { {csgma_ 4$ 中,在比例性低位值 $\ math/ {c} 下,我们精确地描述预测错误必然会增加,在培训错误中,至少是线性化的。因此,最佳性工作需要使培训数据精确度高得多。