With the widespread use of Deep Neural Networks (DNNs), machine learning algorithms have evolved in two diverse directions -- one with ever-increasing connection density for better accuracy and the other with more compact sizing for energy efficiency. The increase in connection density increases on-chip data movement, which makes efficient on-chip communication a critical function of the DNN accelerator. The contribution of this work is threefold. First, we illustrate that the point-to-point (P2P)-based interconnect is incapable of handling a high volume of on-chip data movement for DNNs. Second, we evaluate P2P and network-on-chip (NoC) interconnect (with a regular topology such as a mesh) for SRAM- and ReRAM-based in-memory computing (IMC) architectures for a range of DNNs. This analysis shows the necessity for the optimal interconnect choice for an IMC DNN accelerator. Finally, we perform an experimental evaluation for different DNNs to empirically obtain the performance of the IMC architecture with both NoC-tree and NoC-mesh. We conclude that, at the tile level, NoC-tree is appropriate for compact DNNs employed at the edge, and NoC-mesh is necessary to accelerate DNNs with high connection density. Furthermore, we propose a technique to determine the optimal choice of interconnect for any given DNN. In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN. We demonstrate that the interconnect optimization in the IMC architecture results in up to 6$\times$ improvement in energy-delay-area product for VGG-19 inference compared to the state-of-the-art ReRAM-based IMC architectures.
翻译:随着深神经网络(DNN)的广泛使用,机器学习算法在两个不同方向上演化 -- -- 一个方向的连接密度不断增加,以便提高准确性,另一个方向的能效则更加紧凑。连接密度的增加增加了芯片数据移动的密度,使芯片通信效率成为 DNN 加速器的关键功能。 这项工作的贡献是三倍。 首先,我们说明基于点到点(P2P)的连接无法为 DNNS处理大量的芯片数据移动。 其次,我们评估P2P 和网络-芯片(NC)的密度,以便提高能效。 SRAM- 和 ReRAM 的模拟计算(IMC) 结构对于DNNNNC 范围来说是一个至关重要的功能。 我们为不同 DNNMC 的终端数据数据转换进行实验性评估, 以实验性的方式评估 IMC 结构的性能, 以 NNC- C 速度 和 NC 内部 技术显示我们所使用的高超链接。