Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively.
翻译:然而,执行CNN需要大量计算资源,并产生大量的记忆流量,这对计算系统设计构成严重挑战。通过优化同步处决和数据再利用,静态架构在加速CNN计算方面显示出巨大的优势。然而,传统静态架构中的常规内部数据传输路径阻止了静态架构完全利用神经网络孔径效应带来的惠益。对现有静态结构部署细微的静默性受到计算间接费用的极大阻碍。在这项工作中,我们提议S2Engine $-$,这是一个新型的静态架构,可以充分利用CNN的松散性,最大限度地重复使用数据。S2Engine 内部传输压缩数据,允许每个处理元素动态地从电流中选择压缩数据流流流的匹配数据。与天性静脉阵列相比,S2Engine在速度和能源效率上分别实现了约3.2美元和约3.0美元。