Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation in the ultra-low precision regime and ignore the fact that emergent hardware accelerators begin to support mixed-precision computation. Consequently, we present a novel and principled framework to solve the mixed-precision quantization problem in this paper. Briefly speaking, we first formulate the mixed-precision quantization as a discrete constrained optimization problem. Then, to make the optimization tractable, we approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix. Finally, based on the above simplification, we show that the original problem can be reformulated as a Multiple-Choice Knapsack Problem (MCKP) and propose a greedy search algorithm to solve it efficiently. Compared with existing mixed-precision quantization works, our method is derived in a principled way and much more computationally efficient. Moreover, extensive experiments conducted on the ImageNet dataset and various kinds of network architectures also demonstrate its superiority over existing uniform and mixed-precision quantization approaches.
翻译:量化是一种广泛使用的压缩和加速深神经网络的技术。然而,常规量化方法对于所有(或大多数)层都使用相同的位宽,在超低精确度制度中,它们往往会遭受高度精确性降解,忽视了突发硬件加速器开始支持混合精度计算的事实。因此,我们提出了一个新颖和有原则的框架,以解决本文中的混合精度分解问题。简而言之,我们首先将混合精度分解作为分散的有限优化问题进行设计。然后,为了使优化可拉动性,我们将目标功能与二级泰勒扩展相近,并提出一种高效的方法来计算其黑森矩阵。最后,根据上述简化,我们表明,原始问题可以被改写成多氯乙烯Knapsack问题(MCKP),并提议一种贪婪的搜索算法,以有效解决这一问题。与现有的混合精度分解分解工作相比,我们的方法是以有原则的方式推导出,并且更具有计算效率。此外,我们还提出了一种高效的方法,对目标功能进行近似的方法。此外,我们还进行了广泛的图像网络的高级化,并展示了各种图像网络的多样化结构结构结构。