培训后量化的高效适应性动态四舍五入 (Efficient Adaptive Activation Rounding for Post-Training Quantization)

Post-training quantization (PTQ) attracts increasing attention due to its convenience in deploying quantized neural networks. Rounding is the primary source of quantization error, for which previous works adopt the rounding-to-nearest scheme with a constant border of 0.5. This work demonstrates that optimizing rounding schemes can improve model accuracy. By replacing the constant border with a simple border function, we can obtain the minimal error for multiplying two numbers and eliminate the bias of its expected value, which further benefits model accuracy. Based on this insight, we approximate the border function to make the incurred overhead negligible. We also jointly optimize propagated errors and global errors. We finally propose our AQuant framework, which can learn the border function automatically. Extensive experiments show that AQuant achieves noticeable improvements compared with state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.31% under the 2-bit weight and activation post-training quantization.

翻译：培训后量化(PTQ)由于在部署量化神经网络方面方便,吸引了越来越多的关注。四舍五入是量化错误的主要来源,此前的工程为此采用了以0.5不变边界为固定边界的圆对近计划。这项工作表明,优化四舍五入计划可以提高模型准确性。通过以简单的边界功能取代常态边界,我们可以获得最小的错误, 将两个数字乘以两个数字, 并消除预期值的偏差, 这进一步有利于模型准确性。基于这一洞察, 我们比较了边界功能, 使产生的间接费用微不足道。我们还共同优化了传播错误和全球错误。我们最后提出了AQuat 框架, 该框架可以自动学习边界函数。广泛的实验显示, AQuant 与最新工程相比取得了显著的改进, 并将ResNet-18 的精度提高到2位重量下的60.31%, 并激活了培训后的四分量制。