A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP). A prominent example is predictive coding (PC), which is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods, however, fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. In this work, we solve this problem by generalizing PC to arbitrary probability distributions, enabling the training of architectures, such as transformers, that are hard to approximate with only Gaussian assumptions. We perform three experimental analyses. First, we study the gap between our method and the standard formulation of PC on multiple toy examples. Second, we test the reconstruction quality on variational autoencoders, where our method reaches the same reconstruction quality as BP. Third, we show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models. More broadly, this method allows neuroscience-inspired learning to be applied to multiple domains, since the internal distributions can be flexibly adapted to the data, tasks, and architectures used.
翻译:最近的大量研究具有为深神经网络寻找培训方法的深远目标,这些深神经网络可以作为反反向调整(BP)的替代品。一个突出的例子就是预测编码(PC),这是一种神经科学启发型方法,对高斯等级基因化模型进行推断。然而,这些方法未能跟上现代神经网络,因为它们无法复制复杂层层和激活功能的动态。在这项工作中,我们通过将PC推广到任意的概率分布,使变压器等结构的培训(例如变压器)难以与高斯假设相近。我们进行了三个实验分析。首先,我们研究了我们的方法与计算机标准配方之间在多个微量例子上的差距。第二,我们测试了变压自动电解码器的重建质量,因为我们的方法与BP具有相同的重建质量。第三,我们证明我们的方法使我们能够对变压器网络进行培训,并在有条件的语言模型上取得与BP类似的性能。更广泛地说,这种方法使得神经科学激励型的学习能够应用到多个领域,自内部分配以来,数据可以被灵活地应用到多个领域。