To achieve the low latency, high throughput, and energy efficiency benefits of Spiking Neural Networks (SNNs), reducing the memory and compute requirements when running on a neuromorphic hardware is an important step. Neuromorphic architecture allows massively parallel computation with variable and local bit-precisions. However, how different bit-precisions should be allocated to different layers or connections of the network is not trivial. In this work, we demonstrate how a layer-wise Hessian trace analysis can measure the sensitivity of the loss to any perturbation of the layer's weights, and this can be used to guide the allocation of a layer-specific bit-precision when quantizing an SNN. In addition, current gradient based methods of SNN training use a complex neuron model with multiple state variables, which is not ideal for compute and memory efficiency. To address this challenge, we present a simplified neuron model that reduces the number of state variables by 4-fold while still being compatible with gradient based training. We find that the impact on model accuracy when using a layer-wise bit-precision correlated well with that layer's Hessian trace. The accuracy of the optimal quantized network only dropped by 0.2%, yet the network size was reduced by 58%. This reduces memory usage and allows fixed-point arithmetic with simpler digital circuits to be used, increasing the overall throughput and energy efficiency.
翻译:实现神经神经网络( Spiking Neal Network) 的低纬度、高吞吐量和能源效率效益, 降低在神经形态硬件运行时的记忆和计算要求是重要的一步。 神经形态结构允许与变量和本地位精度进行大规模平行计算。 但是, 如何将不同比特精度分配到网络的不同层或连接上并不是微不足道的。 在这项工作中, 我们展示了一种从层到层的跟踪分析, 如何测量损失对层重量的任何扰动的敏感度, 在对 SNN 进行裁分时, 可以用来指导特定层比特精度的比特精度分配。 此外, 以梯度为基础的SNN培训方法使用一个复杂的神经模型, 具有多种状态变量, 这对于计算和记忆效率并不理想。 为了应对这一挑战, 我们展示了一个简化的神经模型, 将状态变量的数量减少4倍, 同时仍然与基于梯度的培训相兼容。 我们发现, 当使用一个精度的比位和精度更精确的比精确度分配值分配时, 这个基于58 平面网络的精度的精度使用这个精度的精度将使得精度网络的精度降低的精度降低的精度网络通过该图层的精度降低的精度网络, 。