Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with fewer redundant features. We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations. The bottleneck uses an appropriate number of bit-shift filters to process intrinsic feature maps, then applies a series of transformations that consist of bit-wise shifts with addition operations to generate more feature maps that fully learn to capture information underlying intrinsic features. We schedule the number of bit-shift and addition operations for different hardware platforms. We conduct extensive experiments and ablation studies with desktop and embedded (Jetson Nano) devices for implementation and measurements. We demonstrate the proposed GhostSA block can replace bottleneck blocks in the backbone of state-of-the-art networks architectures and gives improved performance on image classification benchmarks. Further, our GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3x) than GhostNet. When compared to GhostNet, inference latency on the Jetson Nano is improved by 1.3x and 2x on the GPU and CPU respectively. Code is available open-source on \url{https://github.com/JIABI/GhostShiftAddNet}.
翻译:在有线电视新闻网中,密集的倍增可能会带来资源影响,从而挑战有效部署资源限制边缘设备推断的能力。本文提议GhostShifftAddNet, 其动机是实施一个硬件高效的深网络:一个无倍倍增的CNN, 其冗余功能较少。 我们引入一个新的瓶颈块, GhostSA, 将块内的所有倍增转换为廉价操作。 瓶颈区在处理内设功能地图时使用适当数量的位移过滤器, 然后应用一系列由小位移组成的转换, 并增加操作, 以生成更多的功能图, 以充分学习获取内在特征方面的信息。 我们为不同的硬件平台安排了位移和添加操作的数量。 我们用桌面和嵌入式( Jetson Nano) 设备进行广泛的实验和升级研究。 我们演示GhostSA 区可以在州- 亚特网架构的骨干中使用适当数量的位化过滤器 {, 然后在图像分类基准上提高性。 此外, 我们的GhostGSISBI/Netx的精确度和GliftLODRefors。