The complicated architecture and high training cost of vision transformers urge the exploration of post-training quantization. However, the heavy-tailed distribution of vision transformer activations hinders the effectiveness of previous post-training quantization methods, even with advanced quantizer designs. Instead of tuning the quantizer to better fit the complicated activation distribution, this paper proposes NoisyQuant, a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers. We make a surprising theoretical discovery that for a given quantizer, adding a fixed Uniform noisy bias to the values being quantized can significantly reduce the quantization error under provable conditions. Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution with additive noisy bias to fit a given quantizer. Extensive experiments show NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead. For instance, on linear uniform 6-bit activation quantization, NoisyQuant improves SOTA top-1 accuracy on ImageNet by up to 1.7%, 1.1% and 0.5% for ViT, DeiT, and Swin Transformer respectively, achieving on-par or even higher performance than previous nonlinear, mixed-precision quantization.
翻译:复杂结构以及高培训成本的视觉变压器的复杂架构和高培训成本促使人们探索培训后四分制。 但是,即使使用先进的量子设计,视觉变压器的繁琐分流会阻碍先前的训练后四分制方法的有效性,即使采用先进的量子设计。本文没有调整四分制,以更好地适应复杂的激活分布,而是建议对四分制进行调整,而是建议NoisyQuant,这是培训后激活变压器的四分制性能的四分制强化。我们做了一个令人惊讶的理论发现,即对于某个特定量化器而言,在量化的值中加上固定的统一响亮偏差可以大大降低在可变条件下的量化错误。根据理论的洞察,NoisyQuant在积极改变重整的振动分布方面取得了第一次成功,同时以添加的噪声偏差来适应给给给的前一个二次量化器。 广泛实验显示,NoisyQuant在很大程度上改进了视觉变压器的培训后四分制性能,且计算间接费用。 例如,线型六比振动四分制四分制四分制,NoisQt,甚至可以大幅改善SOT- 17%,在SIT级最高一级-1%-1%上达到1.7%或1%。