Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perform technology-circuit-architecture co-design to break this tradeoff with no additional area cost and lower power consumption compared with conventional designs while providing dynamic reconfiguration, which can hide the reconfiguration time behind the execution time. Leveraging the intrinsic transistor structure and non-volatility of ferroelectric FET (FeFET), compact FPGA primitives are proposed and experimentally verified, including 1FeFET look-up table (LUT) cell, 1FeFET routing cell for connection blocks (CBs) and switch boxes (SBs). To support dynamic reconfiguration, two local copies of primitives are placed in parallel, which enables loading of arbitrary configuration without interrupting the active configuration execution. A comprehensive evaluation shows that compared with the SRAM-based FPGA, our dynamic reconfiguration design shows 63.0%/71.1% reduction in LUT/CB area and 82.7%/53.6% reduction in CB/SB power consumption with minimal penalty in the critical path delay (9.6%). We further implement a Super-Sub network model to show the benefit from the context-switching capability of our design. We also evaluate the timing performance of our design over conventional FPGA in various application scenarios. In one scenario that users switch between two preloaded configurations, our design yields significant time saving by 78.7% on average. In the other scenario of implementing multiple configurations with dynamic reconfiguration, our design offers time saving of 20.3% on average.
翻译:外地可编程门阵列( FPGA) 被广泛用于加速深层次学习应用程序的加速, 原因是其可重新配置、 灵活性和时间到市场的快速时间。 但是, 常规 FPGA 却因芯片区与整形衬垫之间的偏差而受到影响, 使得高效的 FPGA 加速需要在多个配置之间转换仍然难以实现。 在本文中, 我们执行技术- 电路结构联合设计, 以不增加面积成本和降低电力消耗与常规设计相比, 从而打破这一交替关系, 并提供动态重组, 从而可以隐藏执行时间之后的重组时间。 利用铁电路板的内在晶体管结构和非波动性能结构, 提议并实验性地验证了FPGA的精密结构, 包括1 FEET 上色表( LUT) 单元格, 1 FEFET 路槽, 用于连接区( CBSB) 和开关箱的预置模式。 为支持动态重组, 当地2份原始版 评估, 使得任意配置不会干扰配置执行。