Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.
翻译:目前最先进的深层网络都是通过回向转换来驱动的。 在本文中,我们探索了以阻隔式学习规则的形式,利用自我监督学习的最新进展,取代全面反向调整的替代方法。我们展示了由在每一街区独立培训ResNet-50的4个主要层组成、带有Barlow Twins损失功能的RESNet-50的连续培训程序,几乎同时在图像网络上进行端到端的反向分析:在我们条块状预培训模型上方训练的线性探测器获得了70.48%的顶级1级分类精确度,仅低于终端到终端预培训网络的准确度(71.57%的准确度)1.1%。 我们进行了广泛的实验,以了解我们方法中不同组成部分的影响,并探索各种自我监督学习到阻塞式模式的适应方法,从而建立对将本地学习规则推广到大型网络的关键途径的详尽理解,其影响从硬件设计到神经科学。