The rapid updates in error-resilient applications along with their quest for high throughput have motivated designing fast approximate functional units for Field-Programmable Gate Arrays (FPGAs). Studies that proposed imprecise functional techniques are posed with three shortcomings: first, most inexact multipliers and dividers are specialized for Application-Specific Integrated Circuit (ASIC) platforms. Second, state-of-the-art (SoA) approximate units are substituted, mostly in a single kernel of a multi-kernel application. Moreover, the end-to-end assessment is adopted on the Quality of Results (QoR), but not on the overall gained performance. Finally, existing imprecise components are not designed to support a pipelined approach, which could boost the operating frequency/throughput of, e.g., division-included applications. In this paper, we propose RAPID, the first pipelined approximate multiplier and divider architecture, customized for FPGAs. The proposed units efficiently utilize 6-input Look-up Tables (6-LUTs) and fast carry chains to implement Mitchell's approximate algorithms. Our novel error-refinement scheme not only has negligible overhead over the baseline Mitchell's approach but also boosts its accuracy to 99.4% for arbitrary size of multiplication and division. Experimental results demonstrate the efficiency of the proposed pipelined and non-pipelined RAPID multipliers and dividers over accurate counterparts. Moreover, the end-to-end evaluations of RAPID, deployed in three multi-kernel applications in the domains of bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) indicate up to 45% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over accurate kernels, with negligible loss in QoR.
翻译:快速更新有误反应的应用程序以及它们追求高通量,促使它们设计了快速近似功能单位,用于外地可编程门阵列(FPGAs) 。提出功能技术不精确的研究有以下三个缺点:第一,大多数不精确的倍增器和分化器专门用于应用程序特定集成(ASIC)平台。第二,最先进的(SoA)近似单位被替换为多层应用程序的单一核心部分。此外,在成果质量(QOR)上采用了端对端评估,但没有在总体业绩上采用。最后,现有不精确的部件没有设计来支持管道式方法,例如,大多数不精确的乘数和分化的分解器专门用于应用程序。在本文中,我们提出了最先进的集成集成集成的集成集成集成集成电和分解结构,拟议的单位有效利用了6倍的上调图(6-LUTs)和快速递增链,用于执行Mitchell-ID的直径直径四四四四四十八级平级平流的平流计算,也显示了我们的直径直径直径直径直径机计算。