Heterogeneous computers integrate general-purpose host processors with domain-specific accelerators to combine versatility with efficiency and high performance. To realize the full potential of heterogeneous computers, however, many hardware and software design challenges have to be overcome. While architectural and system simulators can be used to analyze heterogeneous computers, they are faced with unavoidable compromises between simulation speed and performance modeling accuracy. In this work we present HEROv2, an FPGA-based research platform that enables accurate and fast exploration of heterogeneous computers consisting of accelerators based on clusters of 32-bit RISC-V cores and an application-class 64-bit ARMv8 or RV64 host processor. HEROv2 allows to seamlessly share data between 64-bit hosts and 32-bit accelerators and comes with a fully open-source on-chip network, a unified heterogeneous programming interface, and a mixed-data-model, mixed-ISA heterogeneous compiler based on LLVM. We evaluate HEROv2 in four case studies from the application level over toolchain and system architecture down to accelerator microarchitecture. We demonstrate how HEROv2 enables effective research and development on the full stack of heterogeneous computing. For instance, the compiler can tile loops and infer data transfers to and from the accelerators, which leads to a speedup of up to 4.4x compared to the original program and in most cases is only 15 % slower than a handwritten implementation, which requires 2.6x more code.
翻译:电子化计算机将通用主机处理器与特定域的加速器结合起来,将多功能性与效率和高性能结合起来。然而,要充分发挥多种计算机的潜力,许多硬件和软件的设计挑战必须克服。虽然建筑和系统模拟器可以用来分析多种计算机,但它们面临着模拟速度和性能建模准确度之间不可避免的折中。在这项工作中,我们展示了HEROv2,一个基于FPGA的研究平台,能够准确和快速地探索由32比特的RISC-V核心和64比特的应用程序级Amv8或RV64主机处理器组成的加速器组成的不同计算机。HEROv2允许在64比特的主机和32比特的加速器之间无缝地共享数据,它们面临着在模拟速度和性能建模网络上完全开放源码的折叠合式程序,一个基于LLVMM的混合数据模型、混合的ISA混合调制的汇编器。我们评估了四个案例研究中的SHEROv2,从工具链和系统结构应用层到一个64比标准级的60比标准级的AULELECO化程序更能能化的系统, 和数字化的计算系统化的系统化的系统能是如何在15级化的系统上进行一个有效的计算。