Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2*0.5.
翻译:后训练量化(PTQ)被广泛认为是实际上最有效的压缩方法之一,因其数据隐私和低计算成本的优点。我们认为,PTQ方法中存在被忽视的震荡问题。在本文中,我们首次以理论方式探究并提出了解释震荡问题在PTQ中的重要性的证明。然后,我们尝试从原则上和广义上介绍一个框架来解决这个问题。特别地,我们首先根据数据依赖性和数据无关性情况下,定义模块容量(ModCap),其中相邻模块之间的差异用于衡量震荡程度,并证明了震荡问题的产生是因为模块容量的差异。随后,我们通过选择前k个差异量来解决问题,在此基础上,通过联合优化和量化相应的模块,进一步优化结果。广泛的实验表明,我们的方法成功地减少了性能下降,并且具有泛化到不同神经网络和PTQ方法的能力。例如,在对ResNet-50进行2/4位量化时,我们的方法优于先前的最先进方法1.9%。在小型模型量化方面表现更为显著,例如,在MobileNetV2*0.5上,我们的方法比BRECQ方法提高6.61%。