We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism. The former offers an energy-efficient and event-driven paradigm for deep learning, while the latter has the ability to capture feature dependencies, enabling Transformer to achieve good performance. It is intuitively promising to explore the marriage between them. In this paper, we consider leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer). The SSA mechanism in Spikformer models the sparse visual feature by using spike-form Query, Key, and Value without softmax. Since its computation is sparse and avoids multiplication, SSA is efficient and has low computational energy consumption. It is shown that Spikformer with SSA can outperform the state-of-the-art SNNs-like frameworks in image classification on both neuromorphic and static datasets. Spikformer (66.3M parameters) with comparable size to SEW-ResNet-152 (60.2M,69.26%) can achieve 74.81% top1 accuracy on ImageNet using 4 time steps, which is the state-of-the-art in directly trained SNNs models.
翻译:我们认为两种生物上可信的结构,即Spiking神经网络(SNN)和自控机制。前者为深层学习提供了一个节能和事件驱动模式,而后者则有能力捕捉特征依赖性,使变异器能够取得良好的性能。我们直觉地希望探索它们之间的结合。在本文中,我们考虑利用Spiking神经网络(Spiking Neal Convention Net)的自控能力和生物特性,并提议一个名为Spiking 变异器(Spiking Turverer)的新的自我关注(Spikfing Terverer (Spikexer)以及一个强大的框架。Spikexed模型中的SSA(66.3M 参数),通过使用不软式的螺旋形阵列阵列阵列、键和价值。由于计算方法稀少,SEW-ResNet1的高效和低计算能消耗量。我们发现,具有SEW-ResNet1的S1552-R(S-Reshartimal IP)在最高步骤上可以达到SE-Resturefal% 259。SEW-ResNet1的SE. 2261(SE-Restal-Recepal)的SElegal 步骤,在最高步骤上可以直接达到最高步骤。