FeCaffe:FPGA帮助的咖啡厅与Intel Stratix Intel Stratix深入学习培训和推断开放CL (FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10)

Deep learning and Convolutional Neural Network (CNN) have becoming increasingly more popular and important in both academic and industrial areas in recent years cause they are able to provide better accuracy and result in classification, detection and recognition areas, compared to traditional approaches. Currently, there are many popular frameworks in the market for deep learning development, such as Caffe, TensorFlow, Pytorch, and most of frameworks natively support CPU and consider GPU as the mainline accelerator by default. FPGA device, viewed as a potential heterogeneous platform, still cannot provide a comprehensive support for CNN development in popular frameworks, in particular to the training phase. In this paper, we firstly propose the FeCaffe, i.e. FPGA-enabled Caffe, a hierarchical software and hardware design methodology based on the Caffe to enable FPGA to support mainline deep learning development features, e.g. training and inference with Caffe. Furthermore, we provide some benchmarks with FeCaffe by taking some classical CNN networks as examples, and further analysis of kernel execution time in details accordingly. Finally, some optimization directions including FPGA kernel design, system pipeline, network architecture, user case application and heterogeneous platform levels, have been proposed gradually to improve FeCaffe performance and efficiency. The result demonstrates the proposed FeCaffe is capable of supporting almost full features during CNN network training and inference respectively with high degree of design flexibility, expansibility and reusability for deep learning development. Compared to prior studies, our architecture can support more network and training settings, and current configuration can achieve 6.4x and 8.4x average execution time improvement for forward and backward respectively for LeNet.

翻译：深层学习和进化神经网络(CNN)近年来在学术和工业领域越来越受欢迎和重要,因为与传统方法相比,它们能够提供更准确性,并导致分类、检测和识别领域,因此近年来在学术和工业领域越来越重要。目前,市场上有许多深层学习发展的流行框架,如Cafe、TensorFlow、Pytoch等,以及大多数本地框架支持CPU,并视GPU默认为主线加速器。FPGA设备被视为潜在的多元平台,仍然无法为CNN大众框架中的CNN发展提供全面支持,特别是培训阶段。在本文件中,我们首先提议采用FCafe,即FPGA的C-Caffe系统,一个基于Caffe的等级软件和硬件设计方法,使FPGAGA能够支持主线的深层学习发展特征,例如培训和与Cafe的误判。此外,我们向FCFAffe提供一些基准,以一些古典CNN网络作为实例,进一步分析当前深度执行时间的细节,从而支持ROCLEQLA培训的精确执行。最后版本设计系统,分别将FCAFAFAFC的网络和FAFAFAFAFAFAFAFC系统升级设计升级系统升级系统升级系统升级升级升级升级升级到预算,在拟议中,可以实现前的升级和FAFA结果结构。