With increasing diversity in Deep Neural Network(DNN) models in terms of layer shapes and sizes, the research community has been investigating flexible/reconfigurable accelerator substrates. This line of research has opened up two challenges. The first is to determine the appropriate amount of flexibility within an accelerator array that that can trade-off the performance benefits versus the area overheads of the reconfigurability. The second is being able to determine the right configuration of the array for the current DNN model and/or layer and reconfigure the accelerator at runtime. This work introduces a new class of accelerators that we call Self Adaptive Reconfigurable Array (SARA). SARA architectures comprise of both a reconfigurable array and a hardware unit capable of determining an optimized configuration for the array at runtime. We demonstrate an instance of SARA with an accelerator we call SAGAR, which introduces a novel reconfigurable systolic array that can be configured to work as a distributed collection of smaller arrays of various sizes or as a single array with flexible aspect ratios. We also develop a novel recommendation neural network called ADAPTNET which recommends an array configuration and dataflow for the current layer parameters. ADAPTNET runs on an integrated custom hardware ADAPTNETX that runs ADAPTNET at runtime and reconfigures the array, making the entire accelerator self-sufficient. SAGAR is capable of providing the same mapping flexibility as a collection of 10244x4 arrays working as a distributed system while achieving 3.5x more power efficiency and 3.2x higher compute density Furthermore, the runtime achieved on the recommended parameters from ADAPTNET is 99.93% of the best achievable runtime.
翻译:深神经网络(DNN) 模型在层形状和大小方面日益多样化, 研究界一直在调查弹性/ 可重新配置的加速器子基质。 这一研究线打开了两个挑战。 第一是确定一个加速器阵列内的适当灵活性量, 该加速器阵列可以将性能效益与可重新配置的可重新配置的阵列面积相权衡。 第二是能够确定当前 DNN 模型和/ 或层的阵列的正确配置, 并在运行时重新配置加速器。 这项工作引入了一个新的加速器类别, 我们称之为“自调整可重新配置的加速器” 的加速器子参数。 这一系统由一个可重新配置的阵列和一个硬件单位组成, 既能将性能效益与可重新配置的阵列相权衡。 我们展示了SARAAAAAAAAAAAAAAAAAAAA 的阵列, 也建议一个可灵活运行的阵列内流。