Modern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recently been characterized using so called \textit{double descent} curves where the risk undergoes another descent (in addition to the classical U-shaped learning curve when the number of parameters is small) as we increase the number of parameters beyond a certain threshold. In this paper, we examine the conditions under which \textit{Benign Overfitting} occurs in the random feature (RF) models, i.e. in a two-layer neural network with fixed first layer weights. We adopt a new view of random feature and show that \textit{benign overfitting} arises due to the noise which resides in such features (the noise may already be present in the data and propagate to the features or it may be added by the user to the features directly) and plays an important implicit regularization role in the phenomenon.
翻译:现代机器学习往往在这样的制度下运作,即参数数大大高于数据点数,培训损失为零,但一般化程度良好,从而与传统的偏向偏差取舍相矛盾。最近使用所谓的“textit{benign overformaty}”曲线来描述这种现象,在这种曲线中,由于我们把参数数增加到某一阈值以外,风险会再次下降(在典型的U-mage学习曲线中,参数数较小)。在本文中,我们审视了随机特征模型(RF)中发生\textit{Benign overformat}的条件,即具有固定第一层重量的两层神经网络。我们采用了对随机特征的新观点,并表明由于这些特征中的噪音(数据中可能已经存在噪音,并传播到特征中,或用户可能直接添加到特征中),并在此现象中发挥重要的隐含的规范作用。