With the continued growth in field-programmable gate array (FPGA) capacity and their incorporation into new environments such as datacenters, we have witnessed the introduction of a new class of reconfigurable acceleration devices (RADs) that go beyond conventional FPGA architectures. These devices combine a reconfigurable fabric with coarse-grained domain-specialized accelerator blocks all connected via a high-performance packet-switched network-on-chip (NoC) for efficient system-wide communication. However, we lack the tools necessary to efficiently explore the huge design space for RADs, study the complex interactions between their different components and evaluate various combinations of design choices. In this work, we develop RAD-Sim, a cycle-level architecture simulator that allows rapid application-driven exploration of the design space of novel RADs. To showcase the capabilities of RADSim, we map and simulate a state-of-the-art deep learning (DL) inference overlay on a RAD instance incorporating an FPGA fabric and a complex of hard matrix-vector multiplication engines, communicating over a system-wide NoC. Through this example, we show how RAD-Sim can help architects quantify the effect of changing specific architecture parameters on end-to-end application performance.
翻译:随着外地可编程门阵列能力的持续增长及其融入诸如数据中心等新环境,我们目睹了超越常规的FPGA结构架构的新型可重新配置加速装置(RADs)的引入。这些装置将可重新配置的构件与粗化的专用域内专用加速器区块结合起来,这些构件都通过高性能的包包式转换网络-芯片(NC)连接,以便高效的全系统通信。然而,我们缺乏必要的工具,无法有效探索RADA的巨大设计空间,研究其不同组成部分之间的复杂互动,并评估各种设计选择的组合。在这项工作中,我们开发了RAD-Sim,一个循环级结构模拟器,可以快速应用驱动探索新型RADADS的设计空间。为了展示RADSim的功能,我们绘制并模拟了高水平的深层次学习(DL) 。然而,我们缺乏必要的工具来有效探索RAD实例,包括一个FGA结构和一个复杂的硬矩阵组合组合,并评估各种设计选择组合组合组合组合。在这个工程中,一个循环级结构模拟,可以快速应用新的RADADADAD的系统,可以展示一个特定的系统如何在特定应用中产生效果。