In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool that simulates the execution of neural networks on array-based accelerators and has the capability of testing different configurations for the estimation of energy consumption and processing latency. Based on our analysis of the behavior of benchmark networks under different architectural parameters, we offer a few recommendations for having an efficient yet high performance accelerator design. Next, we propose a heterogeneous multi-core chip scheme for deep neural network execution. The evaluations of a selective small search space indicate that the execution of neural networks on their near-optimal core configuration can save up to 36% and 67% of energy consumption and energy-delay product respectively. Also, we suggest an algorithm to distribute the processing of network's layers across multiple cores of the same type in order to speed up the computations through model parallelism. Evaluations on different networks and with the different number of cores verify the effectiveness of the proposed algorithm in speeding up the processing to near-optimal values.
翻译:在此篇文章中,我们调查了基于阵列的 DNN 加速器的建筑参数对加速器的能源消耗和在各种网络地形中的性能作用的影响。为此目的,我们开发了一个工具,模拟在基于阵列的加速器上执行神经网络,并有能力测试不同配置来估计能源消耗和处理延缓度。根据我们对不同建筑参数下的基准网络行为的分析,我们提出了几项建议,以便有一个高效但高性能加速器设计。接下来,我们提出了用于深神经网络执行的多元多核心芯片计划。对一个有选择的小搜索空间的评估表明,使用其近最佳核心配置的神经网络可以分别节省到能源消耗和能源消耗产品的36%和67%。此外,我们建议了一种算法,将网络层的处理分布于同一类型的多个核心,以便通过模型平行计算加快计算速度。对不同网络和接近于不同核心数的核心进行了评估,以核实在加速处理中的拟议算法的有效性。