The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a particular task. The efficient mapping of multiple CNNs on a single FPGA device is a challenging task as the allocation of compute resources and external memory bandwidth needs to be optimised at design time. This paper proposes f-CNN$^{\text{x}}$, an automated toolflow for the optimised mapping of multiple CNNs on FPGAs, comprising a novel multi-CNN hardware architecture together with an automated design space exploration method that considers the user-specified performance requirements for each model to allocate compute resources and generate a synthesisable accelerator. Moreover, f-CNN$^{\text{x}}$ employs a novel scheduling algorithm that alleviates the limitations of the memory bandwidth contention between CNNs and sustains the high utilisation of the architecture. Experimental evaluation shows that f-CNN$^{\text{x}}$'s designs outperform contention-unaware FPGA mappings by up to 50% and deliver up to 6.8x higher performance-per-Watt over highly optimised GPU designs for multi-CNN systems.
翻译:革命神经网络(CNNs)的预测力是新兴隐性敏感应用(如自主无人驾驶飞机和车辆)的一个不可或缺的因素。 这种系统使用多个CNN, 每一个都经过特定任务培训。 在单一的FPGA设备上对多个CNN进行高效绘图是一项具有挑战性的任务,因为计算资源和外部记忆带宽的分配需要在设计时加以优化。 本文提议在FPGAs上对多个CNN进行优化绘图的自动工具流f- CN$ text{x ⁇ $, 包括一个新型的多CNN硬件结构,加上一个自动化设计空间探索方法,该方法考虑到每个模型的用户指定性能要求,以分配计算资源并生成一个可合成的加速器。 此外, f- CN${text{x$x$在设计时需要优化计算资源和外部记忆带宽度带宽度带宽度的配置。 本文提议在FPG- PA系统上保持高利用率。 实验性能评估显示,f-CN$N$NN$NNN$硬件设计超越高端的GFPA系统, 将GPAFAS- 2012- profrofard- proformadestrax