For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.
翻译:对于人体动作识别任务(HAR),3D 卷积神经网络已被证明具有高效性,达到了最先进的效果。本研究引入了一种新颖的流式结构的工具流,将这种模型映射到 FPGA 上,考虑了模型的固有特性和目标 FPGA 设备的特性。HARFLOW3D 工具流以 ONNX 格式的 3D-CNN 和 FPGA 特性描述作为输入,生成了一个设计,最小化了计算的延迟。该工具流包括若干部分,包括 i)3D-CNN 解析器,ii)性能和资源模型, iii)用于在生成的硬件上执行 3D 模型的调度算法,iv)针对 3D 模型量身定制的资源感知优化引擎,v)自动映射到可综合代码以用于 FPGA。通过在各种 3D-CNN 和 FPGA 系统组合上进行的一些实验,展示了该工具流支持广泛模型和设备的能力。此外,该工具流已经在以前没有映射到 FPGA 上的 3D-CNN 模型上产生了高性能的结果,展示了 FPGA 系统在这一领域的潜力。总体而言,HARFLOW3D 已经证明了与各种最先进的手动调优方法相比,可以提供具有竞争力的延迟,能够实现比一些现有工作高达5倍的性能。