Post-training quantization (PTQ) attracts increasing attention due to its convenience in deploying quantized neural networks. Rounding, the primary source of quantization error, is optimized only for model weights, while activations still use the rounding-to-nearest operation. In this work, for the first time, we demonstrate that well-chosen rounding schemes for activations can improve the final accuracy. To deal with the challenge of the dynamicity of the activation rounding scheme, we adaptively adjust the rounding border through a simple function to generate rounding schemes at the inference stage. The border function covers the impact of weight errors, activation errors, and propagated errors to eliminate the bias of the element-wise error, which further benefits model accuracy. We also make the border aware of global errors to better fit different arriving activations. Finally, we propose the AQuant framework to learn the border function. Extensive experiments show that AQuant achieves noticeable improvements with negligible overhead compared with state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.3\% under the 2-bit weight and activation post-training quantization.
翻译:培训后夸度( PTQ) 因其在部署量化神经网络时的方便性而吸引了越来越多的关注。 圆形是四舍五入的主要误差源, 仅对模型重量优化, 而激活仍然使用圆到近的操作。 在这项工作中, 我们第一次表明, 精心选择的启动四舍五入方案可以提高最后的准确性。 为了应对激活圆形计划的动态性挑战, 我们通过简单功能调整圆形边界, 在推断阶段生成圆形计划。 边界功能包括重量错误、 激活错误的影响, 以及为消除元素误差的偏差而传播的错误, 这些误差进一步有利于模型的准确性。 我们还让边界了解全球误差, 以更好地适应不同到达的激活。 最后, 我们提议了 AQuant 框架来学习边界功能。 广泛的实验显示, AQuant 与状态工程相比, 微不足道的间接费用取得了显著的改善, 并将ResNet-18 的精确度提高到2bit 和 后平方位培训下的平方位 。