We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. Power-of-Two quantization can convert the multiplication introduced by quantization and dequantization to bit-shift that is adopted by many efficient accelerators. However, the Power-of-Two scale factors have fewer candidate values, which leads to more rounding or clipping errors. We propose a novel Power-of-Two PTQ framework, dubbed RAPQ, which dynamically adjusts the Power-of-Two scales of the whole network instead of statically determining them layer by layer. It can theoretically trade off the rounding error and clipping error of the whole network. Meanwhile, the reconstruction method in RAPQ is based on the BN information of every unit. Extensive experiments on ImageNet prove the excellent performance of our proposed method. Without bells and whistles, RAPQ can reach accuracy of 65% and 48% on ResNet-18 and MobileNetV2 respectively with weight INT2 activation INT4. We are the first to propose the more constrained but hardware-friendly Power-of-Two quantization scheme for low-bit PTQ specially and prove that it can achieve nearly the same accuracy as SOTA PTQ method. The code was released.
翻译:我们为深神经网络引入一种符合硬件要求且不需要长期再培训的“二位低位”培训后量化法(PTQ),用于满足硬件要求的深神经网络。“二位电量”可以将量化和分解带来的倍数转换成由许多高效加速器采用的“位变数”法。然而,“二位电量”因素的候选值较少,导致更圆形或剪切错误。我们提出了一个“二位电量”框架,称为“RAPQ”,它动态地调整整个网络的“二级电源”标准,而不是按层静态地确定它们层层。它可以在理论上将四位化和分解的错误转换成由多个高效加速器采用的“位变数”法。同时,“二位电量”因素的重建方法基于每个单位的“BN”信息。关于图像网的广泛实验证明了我们拟议方法的出色性能。如果没有钟声哨,RAPQQ可以达到65%和48%的准确度,但移动网络2的“2级”标准,而不是按级静态确定其重量的“四层”精确度。在理论上可以把“四号“四号”的“四号”改为“四号”系统具体地实现“四号”的“四号”系统。我们可以具体地提议“四号”的“四分制为“四号”。