HARFLOW3D: 面向延迟的 FPGA 上 3D-CNN 加速器工具流 (HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices)

For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.

翻译：对于人体动作识别任务（HAR），3D 卷积神经网络已被证明具有高效性，实现了最先进的结果。该研究引入了一种新颖的基于流水线架构的工具流，将这些模型映射到 FPGA 上，考虑了模型的内在特征和目标 FPGA 设备的特点。HARFLOW3D 工具流以 ONNX 格式的 3D CNN 和 FPGA 特性的描述作为输入，生成可以最小化计算延迟的设计。该工具流包括许多部分，包括 1）3D CNN 解析器，2）性能和资源模型，3）针对在生成的硬件上执行指定的 3D 模型的调度算法，4）专门针对 3D 模型的资源感知优化引擎，5）到可综合代码的自动映射。该工具流支持各种 3D CNN 和 FPGA 系统的实验表明，该工具流具有支持广泛模型和设备的能力。此外，该工具流已经为一些没有被映射到 FPGA 的 3D CNN 模型产生了高效的结果，展示了 FPGA 系统在此领域的潜力。总体来说，HARFLOW3D 工具流已经证明了与一些现有工作相比，能够提供具有竞争性的延迟，并取得了高达 5 倍的性能提升。

相关内容

FPGA

关注 18

FPGA：ACM/SIGDA International Symposium on Field-Programmable Gate Arrays。 Explanation：ACM/SIGDA现场可编程门阵列国际研讨会。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/fpga/

面向FPGA的布局与布线技术研究综述

专知会员服务

26+阅读 · 2022年9月3日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

近期必读的5篇顶会CVPR 2021【视觉目标跟踪】相关论文和代码

专知会员服务

37+阅读 · 2021年3月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日