Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of each CNN layer are mapped to a matrix multiplication that includes all input features and kernels of each layer and is computed using a systolic array. In this work, we focus on the design of a systolic array with configurable pipeline with the goal to select an optimal pipeline configuration for each CNN layer. The proposed systolic array, called ArrayFlex, can operate in normal, or in shallow pipeline mode, thus balancing the execution time in cycles and the operating clock frequency. By selecting the appropriate pipeline configuration per CNN layer, ArrayFlex reduces the inference latency of state-of-the-art CNNs by 11%, on average, as compared to a traditional fixed-pipeline systolic array. Most importantly, this result is achieved while using 13%-23% less power, for the same applications, thus offering a combined energy-delay-product efficiency between 1.4x and 1.8x.
翻译:进化神经网络(CNNs) 是许多深层学习应用的最先进的解决方案。 为了最大可缩放性, 它们的计算应该将高性能和能源效率结合起来。 实际上, CNN 层的演进映射到一个矩阵乘法, 包括每个层的所有输入特征和内核, 并且使用一个系统阵列来计算。 在这项工作中, 我们侧重于设计一个具有可配置管道的系统阵列, 目标是为CNN 层选择最佳管道配置。 拟议的系统阵列, 叫做 ArrayFlex, 可以以正常或浅质管道模式运行, 从而平衡周期的执行时间和运行时钟频率。 通过选择CNN 层的适当管道配置, ArrayFlex, 平均将状态的CNN 线阵列的隐含度降低11%, 与传统的固定管道的Systolice阵列相比。 最重要的是, 在相同应用中, 13 %- 23 的功率降低, 从而在1. 4 4 和 1. 8 4 和 6 产品效率之间 。