超越反后言外言外言外言外法的扩展法 (Scaling Laws Beyond Backpropagation)

Alternatives to backpropagation have long been studied to better understand how biological brains may learn. Recently, they have also garnered interest as a way to train neural networks more efficiently. By relaxing constraints inherent to backpropagation (e.g., symmetric feedforward and feedback weights, sequential updates), these methods enable promising prospects, such as local learning. However, the tradeoffs between different methods in terms of final task performance, convergence speed, and ultimately compute and data requirements are rarely outlined. In this work, we use scaling laws to study the ability of Direct Feedback Alignment~(DFA) to train causal decoder-only Transformers efficiently. Scaling laws provide an overview of the tradeoffs implied by a modeling decision, up to extrapolating how it might transfer to increasingly large models. We find that DFA fails to offer more efficient scaling than backpropagation: there is never a regime for which the degradation in loss incurred by using DFA is worth the potential reduction in compute budget. Our finding comes at variance with previous beliefs in the alternative training methods community, and highlights the need for holistic empirical approaches to better understand modeling decisions.

翻译：长期以来,人们一直在研究反向调整的替代办法,以更好地了解生物大脑如何学习。最近,这些办法也吸引人们的兴趣,作为更高效地培训神经网络的一种方法。通过放松反向调整所固有的限制(例如对称进料和反馈权重、顺序更新),这些方法能够带来前景,例如当地学习。然而,在最后任务性能、趋同速度、最终计算和数据要求方面,不同方法之间的权衡很少得到概述。在这项工作中,我们利用法律来研究直接反馈对齐~(DFA)的能力,以便有效地培训因果关系偏向变异器。调整法律提供了对模型决定所隐含的权衡的概述,以推断它如何转移到越来越大的模型。我们发现,DFA未能提供比反向调整更有效率的缩放:对于使用DFA造成的损失的退化从来没有一种制度值得在计算预算方面进行可能的削减。我们的发现与替代培训方法的以往信念有差异,并强调需要全面的经验方法来更好地理解模型决定。

相关内容

反向传播

关注 355

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日