The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.7~1.8% on ImageNet, demonstrating the contribution of N2UQ design. Code will be made publicly available.
翻译:压缩神经网络的不单量化战略通常比对应战略(即统一战略)取得更好的业绩,因为其代表能力较强。然而,许多非单质量化方法忽略了执行非单四分化重量/活动量化的复杂预测过程,因为采用非单四分化重量/活动,在硬件部署方面造成不可忽略的时间和空间间接费用。在本研究中,我们建议采用非单向一致量化战略(N2UQ),这种方法可以保持非统一方法的强大代表性能力,同时作为模型推断的统一量化,既便于硬件使用,又有效。我们通过学习在等离异的四分化加权数中将这些实际价值投入量化成不偏差输出水平,学习这些量化网络使用可学习的输入阈值,我们采用一种通用直通度估算器(G-STE),用于固化后向衍生衍生产品计算 w.r.t.阈值参数。此外,我们考虑在等量化过程中保持硬度固定的硬度标准,从而进一步降低标准化的标准化,在等量制下进一步降低标准下,在标准下进一步降低标准化的量化损失。