Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs to significantly improve their classification and training times. With these improvements, many frameworks have become available for implementing CNNs on both CPUs and GPUs, with no support for FPGA implementations. In this work we present a modified version of the popular CNN framework Caffe, with FPGA support. This allows for classification using CNN models and specialized FPGA implementations with the flexibility of reprogramming the device when necessary, seamless memory transactions between host and device, simple-to-use test benches, and the ability to create pipelined layer implementations. To validate the framework, we use the Xilinx SDAccel environment to implement an FPGA-based Winograd convolution engine and show that the FPGA layer can be used alongside other layers running on a host processor to run several popular CNNs (AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework achieves 50 GFLOPS across 3x3 convolutions in the benchmarks. This is achieved within a practical framework, which will aid in future development of FPGA-based CNNs.
翻译:在机器学习领域,特别是由于其视觉识别的高度准确性,革命神经网络(CNNNs)在机器学习领域获得了显著的牵引力;最近的工作推动了GPU实施CNN系统的工作,以大大改善其分类和培训时间;有了这些改进,许多框架可用于在CPU和GPU上实施CNN系统,但没有支持FPGA的实施;在这项工作中,我们展示了广受欢迎的CNN Cafe Cafe框架的修改版,并得到了FPGA的支持。这允许使用CNN模型和专门的FPGA实施软件进行分类,必要时可灵活地重新规划设备,主机和装置之间的记忆交易无缝,简单使用测试台,以及建立管道层执行的能力。为了验证该框架,我们利用Xilinx SDAccel环境实施基于FPGA的Winograd Convoluction引擎。我们展示了FGGGGA层与正在运行的其他层次运行的CNNCNC(AlexNet、Google、GONet、VGGGAA A、OFAO-OFS)的操作框架将在未来达到一个基准。