Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with fewer redundant features. We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations. The bottleneck uses an appropriate number of bit-shift filters to process intrinsic feature maps, then applies a series of transformations that consist of bit-wise shifts with addition operations to generate more feature maps that fully learn to capture information underlying intrinsic features. We schedule the number of bit-shift and addition operations for different hardware platforms. We conduct extensive experiments and ablation studies with desktop and embedded (Jetson Nano) devices for implementation and measurements. We demonstrate the proposed GhostSA block can replace bottleneck blocks in the backbone of state-of-the-art networks architectures and gives improved performance on image classification benchmarks. Further, our GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3x) than GhostNet. When compared to GhostNet, inference latency on the Jetson Nano is improved by 1.3x and 2x on the GPU and CPU respectively.
翻译:深革命神经网络(CNNs)是计算和记忆密集的。在有线电视新闻网中,密集的倍增可能会带来资源影响,从而挑战有效部署资源限制边缘设备推断的能力。本文提出GhostShifftAddNet,其动机是实施一个硬件高效的深网络:一个无倍倍增的CNN,其冗余功能较少。我们引入一个新的瓶颈块,即GhostSA,将街区中的所有倍增转换为廉价操作。瓶颈区将适当数量的位移过滤器用于处理内在地物图,然后应用一系列由点对点转换和附加操作组成的转换,以生成更多功能图,从而充分学习获取内在特征方面的信息。我们为不同硬件平台的位移和添加操作安排了数量。我们用桌面和嵌入式(Jetson Venoo)装置进行广泛的实验和连接研究。我们演示GhostSA块可以通过州-艺术网络架构的骨干取代瓶壳,然后应用一系列的比点变换式转换,并改进图像分类基准的性能(GhestShead Netx) 和Ghestex 的精确度分别比Ghex 和Ghestex 的精确度要小。