关于无脊无脊椎回归中双后裔高峰的普遍性问题 (On the Universality of the Double Descent Peak in Ridgeless Regression)

from arxiv, Published at ICLR 2021. 9 pages + 34 pages appendix. Changes in v6: Small corrections in the proofs of Propositions I.1 and J.2. Experimental results can be reproduced using the code at https://github.com/dholzmueller/universal_double_descent

We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.

翻译：与大多数先前的作品不同,我们的分析适用于一系列广泛的输入分布,其中几乎肯定有全端特征矩阵,使我们能够覆盖各种类型的确定性或随机地貌地图。我们的下层线条线性回归在标签噪音、无脊椎线性回归或 GELU 等标志下,不会在任何这些特征地图的内推临界值周围运行良好。我们详细分析所强加的假设,并为分析性(随机)特征地图提供理论。我们使用这一理论可以表明,我们的假设符合输入分布,我们使用的是随机深层神经网络提供的(Lesgue)密度和地貌地图,其中含有像样、凝胶、软加软或GELU这样的分析性激活功能。作为进一步的例子,我们展示了来自随机四极地特征的地貌地图和多层实验性假设结果。我们进一步补充了我们的数据。