It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It uses the attention mechanism to quantize networks and achieve state-of-art results. Using our method, the 3-bit model can achieve accuracy that exceeds the accuracy of the full-precision baseline model. The proposed double-stage ST activation quantization is easy to apply: inserting it before the convolution.
翻译:实践证明,与培训阶段使用32位浮点数相比,深革命神经网络(DDCNN)在推论期间可以低精度运作,从而节省记忆空间和动力消耗。然而,量化网络总是伴随着精确度的下降。在这里,我们提出一个方法,即双阶段Squeze-and-Threswold(双阶段ST)。它使用关注机制对网络进行量化并取得最新结果。使用我们的方法,3位模型可以达到超过完整精确度基线模型准确度的精确度。拟议的双阶段ST启动量化很容易应用:在进化之前插入。