BackLink: 监督带有向后链接的本地培训 (BackLink: Supervised Local Training with Backward Links)

Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory cost and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacle by completely cutting off the backward path between modules and isolating their gradients to reduce memory cost and accelerate the training process. These methods prevent errors from flowing between modules and hence information exchange, resulting in inferior performance. This work proposes a novel local training algorithm, BackLink, which introduces inter-module backward dependency and allows errors to flow between modules. The algorithm facilitates information to flow backward along with the network. To preserve the computational advantage of local training, BackLink restricts the error propagation length within the module. Extensive experiments performed in various deep convolutional neural networks demonstrate that our method consistently improves the classification performance of local training algorithms over other methods. For example, in ResNet32 with 16 local modules, our method surpasses the conventional greedy local training method by 4.00\% and a recent work by 1.83\% in accuracy on CIFAR10, respectively. Analysis of computational costs reveals that small overheads are incurred in GPU memory costs and runtime on multiple GPUs. Our method can lead up to a 79\% reduction in memory cost and 52\% in simulation runtime in ResNet110 compared to the standard BP. Therefore, our method could create new opportunities for improving training algorithms towards better efficiency and biological plausibility.

翻译：深神经网络通过反向调整算法(BP)增强能力,在解决各种认知任务的竞赛中占主导地位。标准 BP 中有限的培训模式要求端到端错误传播,造成大量记忆成本和禁止模式平行化。现有的当地培训方法旨在通过完全切断模块之间的后向路径和隔离其梯度来消除培训障碍,以降低记忆成本并加快培训进程。这些方法防止模块之间的错误流动,从而导致信息交流,从而导致业绩低下。这项工作提出了一个新的本地培训算法(BackLink),该算法引入了模块之间的模块后向依赖性,并允许模块之间的错误流动。算法有助于信息随网络向后流动。为了保持本地培训的计算优势, BackLink 限制模块中的错误传播长度。在各种深层革命神经网络中进行的广泛实验表明,我们的方法可以不断提高本地培训算法的分类性,从而导致业绩低下。例如,在ResNet32 和16个本地模块中,我们的方法比常规的贪婪本地培训方法高出了4.00 ⁇,并且允许模块之间出现错误。在网络上向后向后向后向后向后向回传递信息流动信息信息信息信息信息信息,为了保持信息网络的准确性信息网络的1.83, 10,在运行成本成本中可以分别分析。在运行中,在计算成本和计算。在运行中,在运行中,在运行中,在计算成本成本和计算成本中,在运行中可以提高成本中,在计算。