Current systolic arrays still suffer from low performance and PE utilization on many real workloads due to the mismatch between the fixed array topology and diverse DNN kernels. We present ReDas, a flexible and lightweight systolic array that can adapt to various DNN models by supporting dynamic fine-grained reshaping and multiple dataflows. The key idea is to construct reconfigurable roundabout data paths using only the short connections between neighbor PEs. The array with 128$\times$128 size supports 129 different logical shapes and 3 dataflows (IS/OS/WS). Experiments on DNN models of MLPerf demonstrate that ReDas can achieve 3.09x speedup on average compared to state-of-the-art work.
翻译:由于固定阵列表层和多种 DNN 内核之间的不匹配,当前星系阵列的性能和PE利用率仍然很低,许多实际工作量的利用情况仍然很差。我们介绍了ReDas,这是一个灵活和轻量级的星系阵列,可以通过支持动态细微重塑和多个数据流来适应各种DNN模型。关键的想法是只使用相邻的PEs之间的短连接来构建可重新配置的环形数据路径。128美元的阵列支持129个不同的逻辑形状和3个数据流(IS/OS/WS)。关于MLPerf DN 模型的实验表明,与最先进的工作相比,ReDas能够平均实现3.09x的加速。