Recently, brain-inspired spiking neuron networks (SNNs) have attracted widespread research interest because of their event-driven and energy-efficient characteristics. Still, it is difficult to efficiently train deep SNNs due to the non-differentiability of its activation function, which disables the typically used gradient descent approaches for traditional artificial neural networks (ANNs). Although the adoption of surrogate gradient (SG) formally allows for the back-propagation of losses, the discrete spiking mechanism actually differentiates the loss landscape of SNNs from that of ANNs, failing the surrogate gradient methods to achieve comparable accuracy as for ANNs. In this paper, we first analyze why the current direct training approach with surrogate gradient results in SNNs with poor generalizability. Then we introduce the temporal efficient training (TET) approach to compensate for the loss of momentum in the gradient descent with SG so that the training process can converge into flatter minima with better generalizability. Meanwhile, we demonstrate that TET improves the temporal scalability of SNN and induces a temporal inheritable training for acceleration. Our method consistently outperforms the SOTA on all reported mainstream datasets, including CIFAR-10/100 and ImageNet. Remarkably on DVS-CIFAR10, we obtained 83$\%$ top-1 accuracy, over 10$\%$ improvement compared to existing state of the art. Codes are available at \url{https://github.com/Gus-Lab/temporal_efficient_training}.
翻译:最近,由大脑启发的神经神经网络(SNN)因其事件驱动和节能特性而吸引了广泛的研究兴趣。然而,由于激活功能的无差别性,使得传统人工神经网络(ANN)无法采用通常使用的梯度下降方法。虽然采用代金梯度(SG)正式允许对损失进行反向分析,但离散跳跃机制实际上区分了SNNS的损失面貌与ANNS的损失面貌,未能采用代用梯度方法实现与ANNS的相似的准确性。在本文中,我们首先分析目前以代金梯度梯度计算结果的直接培训方法为何在SNNIS的不易差异性,这不利于传统人工神经网络(ANNN)使用梯度梯度梯度梯度下降方法。随后我们引入了时间高效培训方法,以便用SG来补偿梯度下降的势头,使培训过程能够以更笼统的方式融合为迷你度。 同时,我们证明TET改进了SNNE的时值比值梯度,并引导了SNNNNRQQR的可继承性培训, 10S-FARS 报告的升级。