As a neural network compression technique, post-training quantization (PTQ) transforms a pre-trained model into a quantized model using a lower-precision data type. However, the prediction accuracy will decrease because of the quantization noise, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Many existing methods determine the quantization parameters by minimizing the distance between features before and after quantization. Using this distance as the metric to optimize the quantization parameters only considers local information. We analyze the problem of minimizing local metrics and indicate that it would not result in optimal quantization parameters. Furthermore, the quantized model suffers from overfitting due to the small number of calibration samples in PTQ. In this paper, we propose PD-Quant to solve the problems. PD-Quant uses the information of differences between network prediction before and after quantization to determine the quantization parameters. To mitigate the overfitting problem, PD-Quant adjusts the distribution of activations in PTQ. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.08% and RegNetX-600MF up to 40.92% in weight 2-bit activation 2-bit. The code will be released at https://github.com/hustvl/PD-Quant.
翻译:作为神经网络压缩技术,后培训量化(PTQ)使用低精度数据类型将预先培训的模型转换成一个量化模型。然而,预测准确性将因量化噪音而降低,特别是在极低位设置中。如何确定适当的量化参数(例如,缩放因数和重量的四舍五入)是目前面临的主要问题。许多现有方法通过在量化之前和之后将功能之间的距离最小化,确定量化参数。使用这一距离作为优化量化参数时只考虑当地信息。我们分析将本地量度量度最小化的问题,并表明这不会导致最佳量化参数。此外,由于PTQ的校准样本数量少,因此,量化模型会因过量而受到影响。我们建议PD-Quast解决问题。 PD-Quart利用网络预测前后的差异信息来确定量化参数。为了减轻过高的问题,PD-Q-Q-QQ-Q-QQ-QQ-QQ-QQ-QQ-QQ-Q-QD-ROD-dealnial-deal-dealnial-dealizalizalization Proviewalation Proviewdalizalizalizalizalizalizalizalizalizalalalizalizalalizal 。我们如何。