Spatial dataflow architectures such as reconfigurable dataflow accelerators (RDA) can provide much higher performance and efficiency than CPUs and GPUs. In particular, vectorized reconfigurable dataflow accelerators (vRDA) in recent literature represent a design point that enhances the efficiency of dataflow architectures with vectorization. Today, vRDAs can be exploited using either hardcoded kernels or MapReduce languages like Spatial, which cannot vectorize data-dependent control flow. In contrast, CPUs and GPUs can be programmed using general-purpose threaded abstractions. The ideal combination would be the generality of a threaded programming model coupled with the efficient execution model of a vRDA. We introduce Revet: a programming model, compiler, and execution model that lets threaded applications run efficiently on vRDAs. The Revet programming language uses threads to support a broader range of applications than Spatial's parallel patterns, and our MLIR-based compiler lowers this language to a generic dataflow backend that operates on streaming tensors. Finally, we show that mapping threads to dataflow outperforms GPUs, the current state-of-the-art for threaded accelerators, by 3.8x.
翻译:重新配置数据流加速器( RDA) 等空间数据流结构可以提供比 CPU 和 GPU 更高得多的性能和效率。 特别是, 在最近文献中, 矢量再配置数据流加速器( vRDA) 代表了一个设计点, 通过矢量化提高数据流结构的效率。 今天, vRDA 可以使用硬码内核或像 空间这样的地图语言来开发, 这些语言不能对数据依赖的控制流进行矢量。 相反, CPU 和 GPU 可以使用通用的螺纹式抽象来编程。 理想的组合将是螺纹式编程模型与 vRDA 的高效执行模型( VRDA ) 的通用性。 我们引入Revet: 一个编程模型、 编程器和执行模型, 使串线应用程序在 vRDAs 上有效运行。 Revetet 编程语言使用线支持比空间的平行模式更广泛的应用, 而我们基于 MLIR 编程的编程器可以将这一语言降低到一个通用数据流向后端端点, 在流中运行上运行的气流 G. 。 最后, 我们通过 Streglex 数据流向 G.