Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. We first develop a theoretical foundation for AdderNets, by showing that both the single hidden layer AdderNet and the width-bounded deep AdderNet with ReLU activation functions are universal function approximators. An approximation bound for AdderNets with a single hidden layer is also presented. We further analyze the influence of this new similarity measure on the optimization of neural network and develop a special training scheme for AdderNets. Based on the gradient magnitude, an adaptive learning rate strategy is proposed to enhance the training procedure of AdderNets. AdderNets can achieve a 75.7% Top-1 accuracy and a 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in the convolutional layer.
翻译:与廉价的添加操作相比, 倍增操作的计算复杂性要高得多。 在深神经网络中广泛使用的变异是精确的交叉关系, 以测量输入特性和变动过滤器之间的相似性, 其中包括浮点值之间的大规模倍增。 在本文中, 我们展示添加器网络( AdderNets), 以便在深神经网络中交易这些巨大的倍增, 特别是变异神经网络, 以更廉价的附加来降低计算成本。 在 AdderNets 中, 我们将过滤器和输入特性之间广泛使用的1美元- 诺姆距离作为输出响应。 我们首先为 aderNets 开发一个理论基础, 以显示单个隐藏层 AdderNet 和带RELU 激活功能的宽度深变异网络是通用功能的近似值。 也展示了带有单个隐藏层的变异网络( CNNs) 的近似链接。 我们进一步分析这一新相似度测量神经网络优化的影响, 并开发一个用于 adverNets and developer comnets