In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allow users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning step during HLS compilation for accurate pipelining of potential critical paths. In addition, TAPA implements several optimization techniques specifically tailored for modern HBM-based FPGAs. In our experiments with a total of 43 designs, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average. The framework is available at https://github.com/UCLA-VAST/tapa and the core floorplan module is available at https://github.com/UCLA-VAST/AutoBridge.
翻译:在本文中,我们提议TAPA,这是一个将C++任务平行数据流程序编成高频FPGA加速器的端到端框架。与现有解决方案相比,TAPA有两大优势。首先,TAPA提供一套方便的API,使用户能够方便地表达灵活和复杂的跨任务通信结构。第二,TAPA在HLS汇编中采用了粗略的底部规划步骤,以准确描述潜在关键路径的管道。此外,TAPA还采用专门为现代HBM-基于FPGAs设计的几种优化技术。在总共43种设计中的实验中,我们把平均频率从147MHMz提高到297 MHz(a 102%的改进幅度),没有流失量和资源利用方面的微小变化。值得注意的是,在16个实验中,我们使原无路图的设计平均达到274 MHzgz。框架可在https://github.com/UCLA-VAST/tapa/tap上查阅,核心地面规划模块可在https://github.com/UCLA-VAST/AridgeBridge上查阅。