用于降低进动神经网络加速器的延迟和断电 (Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators)

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach unlocked by an emergent pruning scheme, periodic pattern-based sparsity (PPS). By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorders the weights and uses a simple indexing unit in hardware to create matches between the weights and activations. Through the compiler-hardware codesign, SPS dataflow enjoys higher degrees of parallelism while being free of the high indexing overhead and without model accuracy loss. Evaluated on popular benchmarks such as VGG and ResNet, the SPS dataflow and accompanying neural network compiler outperform prior work in convolutional neural network (CNN) accelerator designs targeting FPGA devices. Against other sparsity-supporting weight storage formats, SPS results in 4.49x energy efficiency gain while lowering storage requirements by 3.67x for total weight storage (non-pruned weights plus indexing) and 22,044x for indexing memory.

翻译：本文介绍了稀有的周期性循环(SPS)数据流,这提高了支持轻量神经网络的最先进的硬件加速器。具体地说,SPS数据流使SPS数据流能够采用一种新型的硬件设计方法,这种新颖的硬件设计方法由突发的剪裁办法、定期基于模式的隔音系统(PPS)解开。通过利用PPPS的常规性,我们的超常性觉识编译器优化地重新排列了重量,并在硬件中使用一个简单的索引单位,以创造重量和激活之间的匹配。通过编译器硬件编码信号,SPS数据流具有较高程度的平行性,而没有高指数的顶部,没有模型的精确损失。根据流行基准,如VGG和ResNet、SP数据流和伴随的神经网络编译器比前在卷神经网络(CNN)的加速器设计,针对FPGA设备,而与其他支持重量储存格式相比,SPSPS在4.49x能源效率方面得到了提高,同时降低了总存储量要求的3.67x,总存储量指数为22x。