Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of CNNs with mixed-sized tensors on such inflexible accelerators often leads to low hardware utilization, which diminishes the achievable performance and energy efficiency from the accelerators. In this paper, we present a novel way of introducing reconfigurability in the MRR-based CNN accelerators, to enable dynamic maximization of the size compatibility between the accelerator hardware components and the CNN tensors that are processed using the hardware components. We classify the state-of-the-art MRR-based CNN accelerators from prior works into two categories, based on the layout and relative placements of the utilized hardware components in the accelerators. We then use our method to introduce reconfigurability in accelerators from these two classes, to consequently improve their parallelism, the flexibility of efficiently mapping tensors of different sizes, speed, and overall energy efficiency. We evaluate our reconfigurable accelerators against three prior works for the area proportionate outlook (equal hardware area for all accelerators). Our evaluation for the inference of four modern CNNs indicates that our designed reconfigurable CNN accelerators provide improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W, compared to an MRR-based accelerator from prior work.
翻译:光显微显微镜(MRR)的硬件加速器(MRR)显示,它为处理深电流神经神经网络(CNN)提供了破坏性加速和能源效率的提高。然而,以前基于MRR的CNN加速器未能为有不同尺寸的抗冲器的CNN提供高效的适应性。这类CNN的一个例子是深度分解有线电视。在这种不灵活加速器上使用有不同尺寸的抗冲器的CNN加速器时,使用有不同尺寸的抗冲器的推推推推推推力往往导致硬件的低使用率,从而降低加速器的可实现性能和能效。在这个纸张中,我们展示了一种新颖的方法,在以MRRR(NCN)为主的快速加速器上调速度器上调高能和节能效率。 我们用过的硬硬化器在前三级的节能中引入了一种方法,在前两级的节能变速器中引入了一种方法。