With increasing diversity in Deep Neural Network(DNN) models in terms of layer shapes and sizes, the research community has been investigating flexible/reconfigurable accelerator substrates. This line of research has opened up two challenges. The first is to determine the appropriate amount of flexibility within an accelerator array that that can trade-off the performance benefits versus the area overheads of the reconfigurability. The second is being able to determine the right configuration of the array for the current DNN model and/or layer and reconfigure the accelerator at runtime. This work introduces a new class of accelerators that we call Self Adaptive Reconfigurable Array (SARA). SARA architectures comprise of both a reconfigurable array and a hardware unit capable of determining an optimized configuration for the array at runtime. We demonstrate an instance of SARA with an accelerator we call SAGAR, which introduces a novel reconfigurable systolic array that can be configured to work as a distributed collection of smaller arrays of various sizes or as a single array with flexible aspect ratios. We also develop a novel recommendation neural network called ADAPTNET which recommends an array configuration and dataflow for the current layer parameters. ADAPTNET runs on an integrated custom hardware ADAPTNETX that runs ADAPTNET at runtime and reconfigures the array, making the entire accelerator self-sufficient. SAGAR is capable of providing the same mapping flexibility as a collection of 1024 4x4 arrays working as a distributed system while achieving 3.5x more power efficiency and 3.2x higher compute density Furthermore, the runtime achieved on the recommended parameters from ADAPTNET is 99.93% of the best achievable runtime.
翻译:深神经网络(DNN) 模型在层形状和大小方面日益多样化, 研究界一直在调查弹性/ 可重新配置的加速器子基质。 这一研究线打开了两个挑战。 第一是在加速器阵列中确定适当的灵活性数量, 该阵列可以将性能效益与可重新配置的阵列的面积管理相权衡。 第二是能够确定当前 DNN 模型和/或层的阵列的正确配置,并在运行时重新配置加速器。 这项工作引入了一个新的加速器类别, 我们称之为“ 自调整可配置的加速器加速器” 子子。 这一系列研究已经开启了一个新的加速器级, 我们称之为SAGARAR, 这个阵列的阵列可以将性能效益与可调整的面积管理器相对对齐。 SARAAAFA 模型由可重新配置的阵列构成一个最佳的阵列和硬件阵列的智能阵列 。 我们从这个阵列中引入了一个新的可配置的阵列, 运行一个可配置的阵列的阵列的阵列, 运行一个可以将SADADAF4的阵列的阵列的阵列的阵列的阵列, 的运行一个可以将一个可更快速的阵列的阵列的阵列的阵列的阵列的阵列的阵列的阵列的阵列的阵列, 以一个可向一个可更的阵列的阵列的阵列的阵列的阵列的阵列, 。