As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the RWKV language model, we successfully implement `SpikeGPT', a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.
翻译:随着大型语言模型的规模继续扩大,运行该模型所需的计算资源也继续扩大。 Spiking神经网络(SNNS)已经作为一种节能的深层次学习方法出现,它利用了稀有和事件驱动的启动手段来减少与模型推算相关的计算间接费用。虽然它们与许多计算机视觉任务中非喷射模型相比具有竞争力,但SNNS也证明更难培训。结果,它们的性能落后于现代深层次学习,我们尚未看到特殊语言网络在语言生成中的有效性。在本文件中,在RWKV语言模型的启发下,我们成功实施了“SpikeGPT”事件,这是一个带有纯二进制、事件驱动跳动启动装置的基因化语言模型,以减少与模型相关的计算间接费用。我们用三种模型来培训拟议的模型:45M、125M和260M参数。据我们所知,这比任何功能上受过反向培训的 SNNNN还要大4x。我们通过修改变压器块来取代多头的自我关注,在SpikGSrblical Clial上降低二次计算复杂性,而在Srental上显示我们SimPLIPL的Spral的Spreval 。</s>