To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these "domain-specific" accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are several challenges when designing new algorithms and mapping approaches to execute the algorithms for a target problem on new hardware. Previous works have addressed these challenges individually. To address this challenge as a whole, in this work, we present a HW-SW co-design ecosystem for spatial accelerators called Union within the popular MLIR compiler infrastructure. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. Union also includes a plug-and-play library of accelerator cost models and mappers which can easily be extended. The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper. We demonstrate the value of Union for the community with several case studies which examine offloading different tensor operations(CONV/GEMM/Tensor Contraction) on diverse accelerator architectures using different mapping schemes.
翻译:为了满足在商业和科学应用中深层次学习的极端计算需求,数据流加速器正在变得越来越受欢迎。虽然这些“特定域”加速器并非像CPU和GPU那样完全可编程,但它们在数据管弦化方面保持了不同程度的灵活性,即数据流和平铺优化以提高效率。在设计新的算法和绘图方法以实施新硬件目标问题的算法时,存在若干挑战。以前的工作单独地处理了这些挑战。为了应对这一挑战,在这项工作中,我们为流行的 MLIR 编译器基础设施中称为联盟的空间加速器共同设计了一个HW-SW的生态系统。我们的框架允许探索不同的算法及其在几个加速器成本模型上的绘图。联盟还包括一个可轻松扩展的加速器成本模型和映射器的插座库。计算器和加速器成本模型通过新式的绘图模型连接起来,以捕捉空间加速器空间定位器空间定位器操作中的生态系统。我们用不同的缩略图分析器/硬体来系统地研究一个不同的递定的硬体。