FPGA is appropriate for fix-point neural networks computing due to high power efficiency and configurability. However, its design must be intensively refined to achieve high performance using limited hardware resources. We present an FPGA-based neural networks accelerator and its optimization framework, which can achieve optimal efficiency for various CNN models and FPGA resources. Targeting high throughput, we adopt layer-wise pipeline architecture for higher DSP utilization. To get the optimal performance, a flexible algorithm to allocate balanced hardware resources to each layer is also proposed, supported by activation buffer design. Through our well-balanced implementation of four CNN models on ZC706, the DSP utilization and efficiency are over 90%. For VGG16 on ZC706, the proposed accelerator achieves the performance of 2.58x, 1.53x and 1.35x better than the referenced non-pipeline architecture [1], pipeline architecture [2] and [3], respectively.
翻译:由于高功率和可配置性,FPGA适合用于固定点神经网络计算,然而,必须大力改进设计,利用有限的硬件资源实现高性能。我们提出了一个基于FPGA的神经网络加速器及其优化框架,可以实现有线电视新闻网各种模型和FPGA资源的最佳效率。针对高输送量,我们采用了高层次的管道结构,以便提高DSP的利用率。为了取得最佳性能,还提出了向每一层分配平衡的硬件资源的灵活算法,辅之以启动缓冲设计。通过在ZC706上均衡地实施四个CNN模型,DSP的利用率和效率超过90%。对于ZC706上的VGG16, 拟议的加速器的性能分别比引用的非管道结构[1]、管道结构[2]和[3]好2.58x、1.53x和1.35x。