It is unclear how changing the learning rule of a deep neural network alters its learning dynamics and representations. To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb), as well as gated linear networks (GLN). We show that, for each of these learning rules, the evolution of the output function at infinite width is governed by a time varying effective neural tangent kernel (eNTK). In the lazy training limit, this eNTK is static and does not evolve, while in the rich mean-field regime this kernel's evolution can be determined self-consistently with dynamical mean field theory (DMFT). This DMFT enables comparisons of the feature and prediction dynamics induced by each of these learning rules. In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices. In the rich regime, DFA and FA utilize a temporally evolving and depth-dependent NTK. Counterintuitively, we find that FA networks trained in the rich regime exhibit more feature learning if initialized with smaller correlation between the forward and backward pass weights. GLNs admit a very simple formula for their lazy limit kernel and preserve conditional Gaussianity of their preactivations under gating functions. Error modulated Hebb rules show very small task-relevant alignment of their kernels and perform most task relevant learning in the last layer.
翻译:深神经网络的学习规则变化如何改变其学习动力和表达方式,这是不清楚的。为了深入了解学习特点、功能近似和学习规则之间的关系,我们分析了以梯度下降(GD)和生物可复制的替代方法,包括反馈调整(FA)、直接反馈调整(DFA)和错误调制的赫布比亚学习(Hebbb)以及门状线网络(GLN)等无限神经网络的学习规则如何改变其学习动力和表达方式。我们发现,对于其中的每一项学习规则来说,无限宽度产出功能的演进都受到一个时间变化不定的神经变异(eNCTK ) 。在懒惰性培训限制中,这种电子NTK是静态的,没有演进的,而在富有的中地(GFA)系统中,这种变进可以以动态的中平均值(DFA)规则自我一致地决定。这个DFA和后期(HI)的初始变化(MT)任务,如果在经过训练的深度和深度变现中,则用经过最深层次变化的FAFAFA的系统,则显示其前变的深度和后变化的深度变化的比重。