As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, we successfully implement `SpikeGPT', a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.
翻译:随着大型语言模型的规模继续扩大,运行该模型所需的计算资源也继续扩大。 Spikiling神经网络(SNNS)已经作为一种节能的深层次学习方法出现,它利用了稀有和事件驱动的启动手段,以减少与模型推算相关的计算间接费用。虽然它们与许多计算机视觉任务中非喷洒模型相比具有竞争力,但SNNS也证明更难培训。结果,它们的表现落后于现代深层次学习,我们尚未看到 SNNS在语言生成中的有效性。在本文中,我们成功地实施了“SpikeGPT”这一带有纯二进制、事件驱动的启动装置的基因化语言模型。我们用三种模型来培训拟议的模型:45M、125M和260M参数。据我们所知,这比任何功能上不正确训练的 SNNW 还要大4x。我们通过修改变压器块来取代多头的自我关注点,以将二次计算的复杂性降低到线性。在SpildGS-minal 上, 将Srental imalal exal imal exal ex ex exal ex ex exal ex ex ex ex ex ex violviolview vidududuction viduduction viduction viductions viductions us viductions</s>