The large amount of memory bandwidth between local buffer and external DRAM has become the speedup bottleneck of CNN hardware accelerators, especially for activation maps. To reduce memory bandwidth, we propose to learn pruning unimportant blocks dynamically with zero block regularization of activation maps (Zebra). This strategy has low computational overhead and could easily integrate with other pruning methods for better performance. The experimental results show that the proposed method can reduce 70\% of memory bandwidth for Resnet-18 on Tiny-Imagenet within 1\% accuracy drops and 2\% accuracy gain with the combination of Network Slimming.
翻译:本地缓冲和外部 DRAM 之间大量的内存带宽已成为CNN硬件加速器的加速瓶颈,特别是用于激活地图。为了减少内存带宽,我们提议通过启动地图( Zebra) 的零区块规范化来动态地学习不重要的区块。 该战略的计算间接费用较低,可以很容易地与其他修剪方法结合,以提高性能。 实验结果显示,拟议方法可以将Tiny- Imagagenet上的Resnet-18的内存带宽在1 ⁇ 精度下降范围内减少70 ⁇,并将2 ⁇ 精度增益与网络缩缩合并。