We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: training such a deep, overparametrized, network is perfectly equivalent to training a one-layer shallow network.
翻译:我们认为,训练一个由正对角矩阵产品组成的深正向线性网络的问题,在两者之间没有非线性。我们表明,训练里曼梯度梯度的重量等于训练以梯度梯度梯度梯度梯度的全因子化。 这意味着,在这一背景下,不存在过度平衡和隐含偏见的影响:训练这样一个深重的、过分平衡的网络完全等于训练一个一层浅层的网络。