We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As pointed out before, high-level synthesis tools such as the OpenCL framework can easily port codes originally designed for CPUs and GPUs to FPGAs, but it is still difficult to make OpenCL codes run efficiently on FPGAs. This work aims to propose an efficient FPGA implementation of OpenCL High-Performance Computing Applications. To do so, a Data reuse and task mapping techniques are also presented to improve design efficiency. In addition, the following motivations were taken into account when developing FFCNN: 1) FFCNN has been designed to be easily implemented on Intel OpenCL SDK based FPGA design flow. 2) In FFFCN, different techniques have been integrated to improve the memory band with and throughput. A performance analysis is conducted on two deep CNN for Large-Scale Images classification. The obtained results, and the comparison with other works designed to accelerate the same types of architectures, show the efficiency and the competitiveness of the proposed accelerator design by significantly improved performance and resource utilization.
翻译:我们为大型革命神经网络提供了一个新的高效的OpenCL-基于OpenCL的快速加速器,称为FPGA用于革命神经网络的FPGA(FFCNN)快速推导器。FFFCNN以深入管道的 OpenCL内核结构为基础。如前所述,OpenCL框架等高级别综合工具可以很容易地在基于FPGA的Intel OpenCLSDK 设计流程中实施,但是,在FPGA上高效运行OpenCLCL代码仍然困难。这项工作的目的是建议PFGA高效地实施OpenCL高性计算机应用。为此,还提出了数据再利用和任务绘图技术以提高设计效率。此外,在开发FFCNNNNN时,还考虑到了以下动机:(1) FFCFCNNN可以很容易地在基于FGGA的设计流程的Intel OnCLSDK上实施。(2) 在FFFCN中,采用不同技术来改进记忆带和吞没。在大尺度图像分类中进行业绩分析。在两个深度CNNCLNCR的深部图像分类方面进行了绩效分析。为了大大加快使用,通过设计设计设计结构的改进了效率和比较。通过设计,以加速了设计,通过设计设计了设计了设计,从而大大地展示了效率,提高了了设计了设计,提高了了设计结构。提高了了设计。