面向Julia语言的高级综合工具链 (A High-level Synthesis Toolchain for the Julia Language)

With the push towards Exascale computing and data-driven methods, problem sizes have increased dramatically, increasing the computational requirements of the underlying algorithms. This has led to a push to offload computations to general purpose hardware accelerators such as GPUs and TPUs, and a renewed interest in designing problem-specific accelerators using FPGAs. However, the development process of these problem-specific accelerators currently suffers from the "two-language problem": algorithms are developed in one (usually higher-level) language, but the kernels are implemented in another language at a completely different level of abstraction and requiring fundamentally different expertise. To address this problem, we propose a new MLIR-based compiler toolchain that unifies the development process by automatically compiling kernels written in the Julia programming language into SystemVerilog without the need for any additional directives or language customisations. Our toolchain supports both dynamic and static scheduling, directly integrates with the AXI4-Stream protocol to interface with subsystems like on- and off-chip memory, and generates vendor-agnostic RTL. This prototype toolchain is able to synthesize a set of signal processing/mathematical benchmarks that can operate at 100MHz on real FPGA devices, achieving between 59.71% and 82.6% of the throughput of designs generated by state-of-the-art toolchains that only compile from low-level languages like C or C++. Overall, this toolchain allows domain experts to write compute kernels in Julia as they normally would, and then retarget them to an FPGA without additional pragmas or modifications.

翻译：随着向百亿亿次计算和数据驱动方法的推进，问题规模急剧增大，底层算法的计算需求也随之增加。这促使人们将计算任务卸载至通用硬件加速器（如GPU和TPU），并重新燃起了利用FPGA设计问题专用加速器的兴趣。然而，这类问题专用加速器的开发过程目前受困于“双语言问题”：算法通常使用一种（较高层次的）语言开发，但其内核却需在完全不同抽象层次、要求根本性不同专业知识的另一种语言中实现。为解决此问题，我们提出一种基于MLIR的新型编译器工具链，通过将Julia编程语言编写的内核自动编译为SystemVerilog，统一了开发流程，无需任何额外指令或语言定制。该工具链支持动态与静态调度，直接集成AXI4-Stream协议以对接片上/片外存储器等子系统，并生成与供应商无关的RTL代码。此原型工具链能够综合一组信号处理/数学基准测试，在实际FPGA设备上以100MHz频率运行，其吞吐量达到仅支持C或C++等低级语言编译的尖端工具链生成设计的59.71%至82.6%。总体而言，该工具链使领域专家能够按常规方式用Julia编写计算内核，并将其无缝移植至FPGA，无需额外编译指示或代码修改。