超拟合的隐性偏差 (The Implicit Bias of Benign Overfitting)

The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while attaining near-optimal expected loss, has received much attention in recent years, but still remains not fully understood beyond well-specified linear regression setups. In this paper, we provide several new results on when one can or cannot expect benign overfitting to occur, for both regression and classification tasks. We consider a prototypical and rather generic data model for benign overfitting of linear predictors, where an arbitrary input distribution of some fixed dimension $k$ is concatenated with a high-dimensional distribution. For linear regression which is not necessarily well-specified, we show that the minimum-norm interpolating predictor (that standard training methods converge to) is biased towards an inconsistent solution in general, hence benign overfitting will generally not occur. Moreover, we show how this can be extended beyond standard linear regression, by an argument proving how the existence of benign overfitting on some regression problems precludes its existence on other regression problems. We then turn to classification problems, and show that the situation there is much more favorable. Specifically, we prove that the max-margin predictor (to which standard training methods are known to converge in direction) is asymptotically biased towards minimizing a weighted \emph{squared hinge loss}. This allows us to reduce the question of benign overfitting in classification to the simpler question of whether this loss is a good surrogate for the misclassification error, and use it to show benign overfitting in some new settings.

翻译：过拟合的良性现象已经在近年来得到了广泛关注，但除了线性回归问题的明确设置之外，它仍然没有被充分理解。在本文中，我们针对回归和分类任务提供了一些新的结果，阐明了何时可以或不可以期望出现良性过拟合。我们考虑了一个原型和相当通用的数据模型，来解释线性预测器的良性过拟合，其中任意固定维度$k$的输入分布与高维分布相连。对于不一定明确的线性回归，我们证明了最小范数插值预测器（标准训练方法收敛到该预测器）通常对不一致的解有偏差，因此良性过拟合通常不会出现。此外，我们还证明了如何将其推广到标准线性回归之外，通过推断证明了一些回归问题存在良性过拟合，则其他回归问题不存在良性过拟合。然后我们转向分类问题，显示那里的情况要好得多。具体来说，我们证明了最大边际预测器（标准训练方法已知收敛于该预测器）在渐近时有偏向于最小化加权的平方铰链损失。这使我们能够将分类中的良性过拟合问题简化为这个损失是否是误分类误差的良好代理，以及利用它来证明某些新设置中的良性过拟合。