At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the quantification process, but the performance of its quantitative model is not as good as the quantization aware training. This paper presents a novel quantification method called Attention Round. This method gives parameters w the opportunity to be mapped to all possible quantized values, rather than just the two quantized values nearby w in the process of quantization. The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function. In addition, this paper uses the lossy coding length as a measure to assign bit widths to the different layers of the model to solve the problem of mixed precision quantization, which effectively avoids to solve combinatorial optimization problem. This paper also performs quantitative experiments on different models, the results confirm the effectiveness of the proposed method. For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process, which can achieve quantization performance on par with quantization aware training.
翻译:目前,神经网络模型的量化方法主要分为培训后量化(PTQ)和量化认知培训(QAT)等。培训后量化只需要一小部分数据来完成量化过程,但其量化模型的性能不如量化认知培训好。本文介绍了一种新型的量化方法,称为 " 注意回合 " 。这一方法为将参数绘制到所有可能的量化值提供了机会,而不仅仅是量化过程中附近两个量化值。被映射到不同量化值的概率与量化值之间的距离有负关系,与量化值和数值之间的距离以及随着高斯函数的衰减有负关系。此外,本文使用损失编码长度作为衡量标准,为模型的不同层分配点宽度,以解决混合精度量化问题,从而有效避免解决组合优化问题。本文还对不同模型进行了定量实验,结果证实了拟议方法的有效性。对于ResNet18 和 MovetiveNetV2, 后培训四分解过程,本文件中建议仅需要1 024 个数据才能完成绩效培训。