The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs) has recently been posed as a challenge for the community (Pinchetti et al., 2024). Here, we show that 100+ layer PCNs can be trained reliably using a Depth-$μ$P parameterisation (Yang et al., 2023; Bordelon et al., 2023) which we call "$μ$PC". By analysing the scaling behaviour of PCNs, we reveal several pathologies that make standard PCNs difficult to train at large depths. We then show that, despite addressing only some of these instabilities, $μ$PC allows stable training of very deep (up to 128-layer) residual networks on simple classification tasks with competitive performance and little tuning compared to current benchmarks. Moreover, $μ$PC enables zero-shot transfer of both weight and activity learning rates across widths and depths. Our results serve as a first step towards scaling PC to more complex architectures and have implications for other local algorithms. Code for $μ$PC is made available as part of a JAX library for PCNs.
翻译:反向传播(BP)在生物学上的不合理性催生了许多仅依赖局部信息的替代性、脑启发算法,例如预测编码(PC)和平衡传播。然而,这些算法在训练极深网络方面一直存在显著困难,使其难以在大规模场景中与BP竞争。事实上,扩展PC网络(PCNs)近期已被列为该领域的一项挑战(Pinchetti等人,2024)。本文中,我们通过采用一种称为“μPC”的Depth-μP参数化方法(Yang等人,2023;Bordelon等人,2023),证明了100层以上的PCNs能够被可靠地训练。通过分析PCNs的扩展行为,我们揭示了导致标准PCNs在较大深度下难以训练的若干病理特性。随后,我们证明尽管μPC仅解决了部分不稳定性问题,但其能够在简单分类任务上稳定训练极深(最高128层)残差网络,相比当前基准方法,在性能上具有竞争力且需极少调参。此外,μPC实现了权重与活动学习率在宽度和深度维度上的零样本迁移。我们的研究结果为将PC扩展至更复杂架构迈出了第一步,并对其他局部算法具有启示意义。μPC的代码已作为PCNs的JAX库的一部分开源提供。