Recent works have examined how deep neural networks, which can solve a variety of difficult problems, incorporate the statistics of training data to achieve their success. However, existing results have been established only in limited settings. In this work, we derive the layerwise weight dynamics of infinite-width neural networks with nonlinear activations trained by gradient descent. We show theoretically that weight updates are aligned with input correlations from intermediate layers weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate backpropagation-free learning rules, named Align-zero and Align-ada, that theoretically achieve the same alignment as backpropagation. Finally, we test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation.
翻译:最近的工作考察了深层神经网络,这些网络能够解决各种困难问题,如何将培训数据的统计数据纳入其中,以取得成功;然而,现有结果只在有限的情况下才建立;在这项工作中,我们得出无限电线神经网络的分层权重动态,其非线性活化由梯度下降所训练。我们从理论上表明,重量更新与中间层的输入关系相一致,并用经验表明,结果也存在于有限宽度的网络中。调整结果使我们能够制定在理论上与反向调整一致的不反向调整学习规则,即Aleign-zero和Aleign-ada。最后,我们测试了这些在饲料向前和经常性神经网络的基准问题方面的学习规则,并在广泛的网络中展示了与反向调整的相似性能。