Deep neural network (DNN) models continue to grow in size and complexity, demanding higher computational power to enable real-time inference. To efficiently deliver such computational demands, hardware accelerators are being developed and deployed across scales. This naturally requires an efficient scale-out mechanism for increasing compute density as required by the application. 2.5D integration over interposer has emerged as a promising solution, but as we show in this work, the limited interposer bandwidth and multiple hops in the Network-on-Package (NoP) can diminish the benefits of the approach. To cope with this challenge, we propose WIENNA, a wireless NoP-based 2.5D DNN accelerator. In WIENNA, the wireless NoP connects an array of DNN accelerator chiplets to the global buffer chiplet, providing high-bandwidth multicasting capabilities. Here, we also identify the dataflow style that most efficienty exploits the wireless NoP's high-bandwidth multicasting capability on each layer. With modest area and power overheads, WIENNA achieves 2.2X--5.1X higher throughput and 38.2% lower energy than an interposer-based NoP design.
翻译:深心神经网络(DNN) 模型在规模和复杂性上继续增长,要求更高的计算能力以促成实时推断。为了高效提供这种计算需求,正在开发硬件加速器,并在各个尺度上部署。这自然需要一个高效的规模扩大机制,以按照应用程序的要求增加计算密度。 2.5D对干涉器的整合已经成为一个有希望的解决办法,但正如我们在这项工作中显示的那样,网络对包中的有限干涉器带宽和多次跳跃可以减少该方法的效益。为了应对这一挑战,我们建议WiENNA,一个无线无线无线无线无线无线无线无线 DD DNNN 加速器。在WiENNA,无线无线无线无线无线无线无线无线无线无线无线无线自动无线加速器将DNNN 加速器连接到全球缓冲芯片的阵列,提供高带宽多播送能力。在这里,我们还确定了最高效地利用无线无线无线 NoP基于高频带宽多播的多播式数据流风格。