Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient accelerators, the unpredictability of modern HLS tools becomes a major obstacle for them to maintain high accuracy. To address this problem, we propose an automated DSE framework-AutoDSE- that leverages a bottleneck-guided coordinate optimizer to systematically find a better design point. AutoDSE detects the bottleneck of the design in each step and focuses on high-impact parameters to overcome it. The experimental results show that AutoDSE is able to identify the design point that achieves, on the geometric mean, 19.9x speedup over one CPU core for Machsuite and Rodinia benchmarks. Compared to the manually optimized HLS vision kernels in Xilinx Vitis libraries, AutoDSE can reduce their optimization pragmas by 26.38x while achieving similar performance. With less than one optimization pragma per design on average, we are making progress towards democratizing customizable computing by enabling software programmers to design efficient FPGA accelerators.
翻译:将FPGA作为数据中心中的加速器正在成为定制计算机的主流化主流,但FPGA很难编程,这一事实为软件程序程序创造了一个陡峭的学习曲线。即使借助高级合成(HLS),加速器设计者仍必须手工进行代码重建,并进行繁琐的参数调整以实现最佳性能。虽然许多学习模型已被现有工作所利用,以自动化高效加速器的设计,现代HLS工具的不可预测性成为它们保持高精确度的主要障碍。为解决这一问题,我们提议自动的 DSE 框架-AutoDSE 框架-AutoDSE,利用瓶式制导式协调优化器系统寻找更好的设计点。即使借助高级合成(HLS),加速器设计者设计者设计者仍然必须手动地进行代码重建,并侧重于实现最佳性能参数,以克服最佳性能。实验结果表明,AutoDSE能够从一个CPU核心加速到Machsite and Rodinia基准。比HILS的硬度,我们能够通过一个硬化的硬性设计系统优化的硬性设计图书馆,同时通过SLSLSBILS,通过一个硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬质性能。