The increasing popularity of deep neural network (DNN) applications demands high computing power and efficient hardware accelerator architecture. DNN accelerators use a large number of processing elements (PEs) and on-chip memory for storing weights and other parameters. As the communication backbone of a DNN accelerator, networks-on-chip (NoC) play an important role in supporting various dataflow patterns and enabling processing with communication parallelism in a DNN accelerator. However, the widely used mesh-based NoC architectures inherently cannot support the efficient one-to-many and many-to-one traffic largely existing in DNN workloads. In this paper, we propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many (multicast) traffic, and the use of gather packets to support many-to-one (gather) traffic. The analysis of the runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture for an Output Stationary (OS) dataflow architecture. The simulation results demonstrate that the gather packets can help to reduce the runtime latency up to 1.8 times and network power consumption up to 1.7 times, compared with the repetitive unicast method on modified mesh architectures supporting two-way streaming.
翻译:深神经网络(DNN)应用的日益受欢迎程度要求高计算力和高效硬件加速器结构。 DNN 加速器使用大量处理元素(PES)和芯片内存存储重量和其他参数。作为DNN 加速器的通信主干,网络在芯片上(NOC)在支持各种数据流模式和在 DNN 加速器中以通信平行方式进行处理方面发挥了重要作用。然而,广泛使用的基于 mesh 的 NOC 结构本身无法支持高效一对一和多对一的交通。 DNN 工作量中大部分存在的是DNN 。在本文件中,我们提议建立一个经过修改的网形结构,配有单向/双向流的网形结构,以加速一对一(多盘)的(多盘)交通,并使用集包支持多对一(加的)交通加速器的交通交通。对星系层运行的运行周期性拉力分析显示,双向流流流流结构比单向流流流流流的交通结构更好支持。我们提议一个双向流流流流流流流流流的消费结构,以显示正在运行的系统,以显示一个方向的流流流流流流的系统,可以向方向结构。