In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed precision quantization (MPQ) begins to fully leverage the capacity of representation by searching optimized bitwidths for different layers and modules in a network. However, previous studies mainly search the MPQ strategy in a costly scheme using reinforcement learning, neural architecture search, etc., or simply utilize partial prior knowledge for bitwidth assignment, which might be biased and sub-optimal. In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with smoother gradient approximation. Particularly, Differentiable Bitwidth Parameters (DBPs) are employed as the probability factors in stochastic quantization between adjacent bitwidth choices. After the optimal MPQ strategy is acquired, we further train our network with entropy-aware bin regularization and knowledge distillation. We extensively evaluate our method for several networks on different hardware (GPUs and FPGA) and datasets. SDQ outperforms all state-of-the-art mixed or single precision quantization with a lower bitwidth and is even better than the full-precision counterparts across various ResNet and MobileNet families, demonstrating the effectiveness and superiority of our method.
翻译:为了以计算效率的方式部署深层模型,经常使用模型量化方法。此外,作为支持混合比重计算操作的新硬件,最近关于混合精密量化的研究(MPQ)开始充分利用代表能力,为不同层次和模块搜索优化的比特线。然而,以往的研究主要在费用高昂的方案中搜索 MPQ 战略,使用强化学习、神经结构搜索等方法,或只是将部分先前的知识用于比特线任务,这可能是偏差和亚最佳的。在这项工作中,我们提出了一个新型的托普亚氏分级计算法(SDQ)方法,该方法可以在更灵活、全球优化的空间中自动学习 MPQ 战略, 并且使用更平滑的梯度近。 特别是, 使用可区分的比特维特参数(DBPs)作为在相邻的位宽度选择之间进行分解四分解的概率因素。 在获得最佳MPQ战略之后,我们进一步培训我们的网络,甚至以可分解的硬度硬度、可分解的精度精确度方法来自动学习 MPQ。我们所有不同的硬度和混合的网络,我们用来评估了不同的硬度和精度的硬度。