Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision. However, previous post-training quantization methods performed not well on vision transformer, resulting in more than 1% accuracy drop even in 8-bit quantization. Therefore, we analyze the problems of quantization on vision transformers. We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution. We also observe that common quantization metrics, such as MSE and cosine distance, are inaccurate to determine the optimal scaling factor. In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. And we propose to use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration at a small cost. To enable the fast quantization of vision transformers, we develop an efficient framework, PTQ4ViT. Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.
翻译:量化是压缩神经网络的最有效方法之一,它在进化神经网络(CNNs)上取得了巨大成功。最近,视觉变异器在计算机视觉上表现出巨大的潜力。然而,以前的训练后量化方法在视觉变异器上表现不佳,导致即使在8位位数的变异器上也出现超过1%的精度下降。因此,我们分析了视觉变异器的量化问题。我们观察到软式成像和GELU功能后激活值的分布与高斯分布大相径庭。我们还注意到,共同的量化指标,如MSE和Comesine距离,在确定最佳缩放系数方面表现不准确。在本文件中,我们建议采用双一致的量化方法来减少这些振动值上的微误差。我们提议使用赫斯指南度来评估不同的缩放系数,这些系数提高校准的精度。为了能够快速对视野变异器进行量化,我们还开发了一个高效的框架,即PTQ4VT。 实验显示,在近低位图像变异的图像分类中,有近位图像变异的精确度为8号。