DSE 和粗粒合并加速器的早期 DSE 和自动生成 (Early DSE and Automatic Generation of Coarse Grained Merged Accelerators)

Post-Moore's law area-constrained systems rely on accelerators to deliver performance enhancements. Coarse grained accelerators can offer substantial domain acceleration, but manual, ad-hoc identification of code to accelerate is prohibitively expensive. Because cycle-accurate simulators and high-level synthesis flows are so time-consuming, manual creation of high-utilization accelerators that exploit control and data flow patterns at optimal granularities is rarely successful. To address these challenges, we present AccelMerger, the first automated methodology to create coarse grained, control- and data-flow-rich, merged accelerators. AccelMerger uses sequence alignment matching to recognize similar function call-graphs and loops, and neural networks to quickly evaluate their post-HLS characteristics. It accurately identifies which functions to accelerate, and it merges accelerators to respect an area budget and to accommodate system communication characteristics like latency and bandwidth. Merging two accelerators can save as much as 99% of the area of one. The space saved is used by a globally optimal integer linear program to allocate more accelerators for increased performance. We demonstate AccelMerger's effectiveness using HLS flows without any manual effort to fine-tune the resulting designs. On FPGA-based systems, AccelMerger yields application performance improvements of up to 16.7x over software implementations, and 1.91x on average with respect to state-of-the-art early-stage design space exploration tools.

翻译：由于循环精确模拟器和高水平合成流耗时,人工生成高利用加速器很难成功。为了应对这些挑战,我们介绍了AccelMerger, 这是创建粗粒、控制和数据流丰富、合并加速器的第一个自动化方法。 ACcelMerger使用序列对齐匹配来识别类似的功能呼叫仪和回路, 以及快速评估其后 HLS 特性的神经网络。它精确地确定了加速的功能, 并结合了区域预算的加速器, 以及适应系统通信特性, 如含色和带宽。合并了两个州级加速器可以将一个地区的99 % 用于创建粗粒质、控制器和数据流丰富、合并加速器。 AccelMerger使用序列对齐来匹配来识别类似的功能呼叫仪和环, 以及快速合成合成电路流。使用全球最佳的Acral-SLS 程序, 使用最佳性能流程, 将Ardeal- millervemental 配置为HCLS 。