Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet often generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a "double-descent" behavior: (1) The risk (expected out-of-sample prediction error) can grow arbitrarily when the number of parameters $p$ approaches the number of samples $n$, and (2) the risk decreases with $p$ for $p>n$, sometimes achieving a lower value than the lowest risk for $p<n$. The divergence of the risk for OLS can be avoided with regularization. In this work, we show that for some data models it can also be avoided with a PCA-based dimensionality reduction (PCA-OLS, also known as principal component regression). We provide non-asymptotic bounds for the risk of PCA-OLS by considering the alignments of the population and empirical principal components. We show that dimensionality reduction improves robustness while OLS is arbitrarily susceptible to adversarial attacks, particularly in the overparameterized regime. We compare PCA-OLS theoretically and empirically with a wide range of projection-based methods, including random projections, partial least squares (PLS), and certain classes of linear two-layer neural networks. These comparisons are made for different data generation models to assess the sensitivity to signal-to-noise and the alignment of regression coefficients with the features. We find that methods in which the projection depends on the training data can outperform methods where the projections are chosen independently of the training data, even those with oracle knowledge of population quantities, another seemingly paradoxical phenomenon that has been identified previously. This suggests that overparameterization may not be necessary for good generalization.
翻译:在深层学习中,超度的测量是强大的:非常大的模型符合培训数据,完全符合培训数据,但往往非常笼统。这一实现使线性模型的研究回到了回归模型的研究,包括普通最小正方(OLS),这与深层学习一样,表明一种“双光”的行为:(1)当参数的数量接近样本数量时,风险(预期超出表面预测错误)可能会任意增加(预期超出表面预测错误),当参数数量接近样本数量时,美元就会增加(美元),以及(2)风险减少(美元)以美元计算回归率,有时比美元的最低风险降低(美元)的精确度。对于OSLS的风险差异可以通过正规化来避免。在这项工作中,我们显示对于某些数据模型来说,(PCA-OL-OL,也称为主要回归值),我们通过考虑人口和实证主要成分的校正值的校正值来提供一种非矛盾的界限。我们所选择的降低的维度,而OLS的精确度特性则不会任意地受到对抗性攻击,特别是在透视系统内,我们比较了某些模型和直线性预测的模型,我们比较了这些系统的模型中的数据。我们用的是, 比较了某些数据的方法显示了不同的数据, 。我们用的是,这些模型的模型的模型和直径测测测测测测测算方法, 。