We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear models have exactly the same conditional variance as the minimum $\ell_2$-norm solution. Since the noise affects the excess risk only through the conditional variance, this implies that depth does not improve the algorithm's ability to "hide the noise". Our simulations verify that aspects of our bounds reflect typical behavior for simple data distributions. We also find that similar phenomena are seen in simulations with ReLU networks, although the situation there is more nuanced.
翻译:我们把利用梯度流训练的深线网络的过度内插风险捆绑在一起。 在先前用来为最小 $\ ell_ 2$- 诺尔南的内插设定风险界限的环境下, 我们显示随机初始化的深线网络可以近似甚至匹配已知的最小 $\ ell_ 2$- 诺尔南的内插界限。 我们的分析还显示, 深线模型的内插条件差异与最小 $\ ell_ 2$- 诺尔南的解决方案完全相同。 由于噪音只通过条件差异影响过大的风险, 这意味着深度不会提高算法“ 隐藏噪音” 的能力。 我们的模拟可以核实我们界限的各方面反映了简单数据分布的典型行为。 我们还发现,在与RELU 网络的模拟中也看到类似的现象, 尽管那里的情况更加细微。