Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel, which results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs ($\sim$98% weight sparsity) can suffer as low as $\sim$59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.
翻译:螺旋神经网络(Spiking Neural Networks) 的预留,已经成为在资源紧缺的边缘装置上部署深潜 SNN(SNN)的基本方法。虽然现有的修剪方法可以为深潜的SNN提供极高的重量宽度,但高重度带来了工作量不平衡的问题。具体地说,工作量不平衡发生于向平行运行的硬件单元分配不同数量的非零重量,导致硬件利用率低,从而造成更长的悬浮和更高的能源成本。在初步实验中,我们显示稀疏的SNNN(98%重量聚度)可能受到的低至59%的利用率。为减轻工作量不平衡问题,我们建议u-Ticket,我们在那里监测和调整SNNN在以罗特里风滑盘 Hypothesis (LTH) 为基础的切割期间的重量连接,从而保证最后的票在部署硬件时得到最佳的利用。实验表明,我们的u-Ticket(U-Ticket)能够保证100%的硬件利用率,从而降低至76.9%的悬浮度和63.8%的能源成本,比非LTHAV。