We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.
翻译:我们发现,全宽线性深度网络的梯度下降与频率域中的 $\ ⁇ 2/L}$ 桥罚款的线性预测值相趋合。 这与线性完全连接的网络值形成鲜明对比,在网络上,梯度下降与硬边线性支持矢量机解决方案相融合,而不论深度如何。