FPGA 加速二元复杂神经网络加速 (Binary Complex Neural Network Acceleration on FPGA)

Being able to learn from complex data with phase information is imperative for many signal processing applications. Today' s real-valued deep neural networks (DNNs) have shown efficiency in latent information analysis but fall short when applied to the complex domain. Deep complex networks (DCN), in contrast, can learn from complex data, but have high computational costs; therefore, they cannot satisfy the instant decision-making requirements of many deployable systems dealing with short observations or short signal bursts. Recent, Binarized Complex Neural Network (BCNN), which integrates DCNs with binarized neural networks (BNN), shows great potential in classifying complex data in real-time. In this paper, we propose a structural pruning based accelerator of BCNN, which is able to provide more than 5000 frames/s inference throughput on edge devices. The high performance comes from both the algorithm and hardware sides. On the algorithm side, we conduct structural pruning to the original BCNN models and obtain 20 $\times$ pruning rates with negligible accuracy loss; on the hardware side, we propose a novel 2D convolution operation accelerator for the binary complex neural network. Experimental results show that the proposed design works with over 90% utilization and is able to achieve the inference throughput of 5882 frames/s and 4938 frames/s for complex NIN-Net and ResNet-18 using CIFAR-10 dataset and Alveo U280 Board.

翻译：许多信号处理应用程序必须能够从复杂的数据中学习阶段信息。今天真正有价值的深神经网络(DNNS)在潜值信息分析中显示出效率,但在应用到复杂域中却落后于此。深复杂的网络(DCN)可以从复杂的数据中学习,但计算成本却很高; 因此,它们无法满足许多可部署系统即时决策要求, 涉及短时间观测或短时间信号爆发。最近, Binalizized 复合神经网络(BCNNN)将DCN与二进制神经网络(BNN)整合在一起, 显示实时对复杂数据进行分类的巨大潜力。在本文中,我们提议以基于BCNCNNNCN的加速器为基础进行结构运行, 该设备能够提供超过5,000个框架/推力的边端装置。高性能来自算和硬件两端。在算学方面,我们进行结构运行, 并获得20 $80 Net 美元的运行率, 和微不足道的精确度损失; 在硬件方面, 我们提议使用新型的 2DDDDRC操作, 并显示一个超过 Ral- CD- CDeraleraleral 的系统设计图图图, 。