Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer remains challenging. Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. In this work, we propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability. CSQ stabilizes the bit-level mixed-precision training process with a bi-level gradual continuous sparsification on both the bit values of the quantized weights and the bit selection in determining the quantization precision of each layer. The continuous sparsification scheme enables fully-differentiable training without gradient approximation while achieving an exact quantized model in the end.A budget-aware regularization of total model size enables the dynamic growth and pruning of each layer's precision towards a mixed-precision quantization scheme of the desired size. Extensive experiments show CSQ achieves better efficiency-accuracy tradeoff than previous methods on multiple models and datasets.
翻译:在深神经网络(DNN)上广泛应用了混合精密度量度,因为它导致与统一量化相比效率-准确性取舍显著提高。与此同时,确定每一层的确切精确度仍具有挑战性。以前在训练期间对比特级的正规化和裁剪基的动态精确度调整的尝试都受到音响梯度和不稳定趋同的影响。在这项工作中,我们提议采用持续分解量量化法(CSQ),这是寻求混合精度定量化计划的一种位数级培训方法,可提高稳定性。CSQ稳定了位数级混合精度培训进程,在确定每一层的精度的比值和确定每一层的定量精度精度精度精确度时,双级连续地对两层进行平衡。 连续加宽度计划使得在无梯度下进行完全差别化的培训,同时在最终实现精确的四分化模型。一个全方位模型的常规化使每一层的精确度能够动态增长和分化,使每个层的混合精度混合精度的混合精度量化计划达到理想规模。