Network quantization is a dominant paradigm of model compression. However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance. Recently, Sharpness-Aware Minimization (SAM) has been proposed to smooth the loss landscape and improve the generalization performance of the models. Nevertheless, directly applying SAM to the quantized models can lead to perturbation mismatch or diminishment issues, resulting in suboptimal performance. In this paper, we propose a novel method, dubbed Sharpness-Aware Quantization (SAQ), to explore the effect of SAM in model compression, particularly quantization for the first time. Specifically, we first provide a unified view of quantization and SAM by treating them as introducing quantization noises and adversarial perturbations to the model weights, respectively. According to whether the noise and perturbation terms depend on each other, SAQ can be formulated into three cases, which are analyzed and compared comprehensively. Furthermore, by introducing an efficient training strategy, SAQ only incurs a little additional training overhead compared with the default optimizer (e.g., SGD or AdamW). Extensive experiments on both convolutional neural networks and Transformers across various datasets (i.e., ImageNet, CIFAR-10/100, Oxford Flowers-102, Oxford-IIIT Pets) show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization. For example, on ImageNet, SAQ outperforms AdamW by 1.2% on the Top-1 accuracy for 4-bit ViT-B/16. Our 4-bit ResNet-50 surpasses the previous SOTA method by 0.9% on the Top-1 accuracy.
翻译:神经网络量化是模型压缩的主流范式。然而,训练过程中量化权重的突变通常导致严重的损失波动,从而导致锐利的损失景观,使梯度不稳定,从而降低性能。最近,提出了锐度感知优化(Sharpness-Aware Minimization,SAM)来平滑损失景观并提高模型的泛化性能。然而,直接将SAM应用于量化模型可能导致扰动不匹配或减少问题,导致性能下降。在本文中,我们提出了一种新的方法,称为锐度感知量化(Sharpness-Aware Quantization,SAQ),首次探讨SAM在模型压缩,特别是量化中的作用。具体来说,我们首先提供了一种统一的视角,将量化和SAM视为分别引入量化噪声和对抗扰动到模型权重中。根据噪声和扰动项是否相互依赖,可以将SAQ表述为三种情况,进行全面的分析和比较。此外,通过引入高效的训练策略,比如随机交换权重值,SAQ仅与默认优化器(例如SGD或AdamW)相比引入了很少的额外训练开销。在各种数据集(即ImageNet、CIFAR-10/100、Oxford Flowers-102、Oxford-IIIT Pets)上的广泛实验表明,SAQ提高了量化模型的泛化性能,在均匀量化方面取得了SOTA结果。例如,在ImageNet上,SAQ的4位ViT-B/16在Top-1准确性方面比AdamW提高了1.2%。我们的4位ResNet-50在Top-1准确性方面超过以前的SOTA方法0.9%。