Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization across layers can allow mixed precision implementations to achieve a considerably better computation performance trade-off. However, backpropagating through the quantization operation requires introducing gradient approximations, and choosing which layers to quantize is challenging for modern architectures due to the large search space. In this work, we present a constrained learning approach to quantization aware training. We formulate low precision supervised learning as a constrained optimization problem, and show that despite its non-convexity, the resulting problem is strongly dual and does away with gradient estimations. Furthermore, we show that dual variables indicate the sensitivity of the objective with respect to constraint perturbations. We demonstrate that the proposed approach exhibits competitive performance in image classification tasks, and leverage the sensitivity result to apply layer selective quantization based on the value of dual variables, leading to considerable performance improvements.
翻译:使深层学习模型得到低精度执行,而没有显著的性能退化,在资源和延缓性限制环境中是必要的。此外,利用不同层次对量化的敏感性差异,可以使不同层次的精确度执行有好坏,从而大大改进计算性能权衡。然而,通过定量作业反向反向反向反射,需要引入梯度近似值,并选择哪些层次进行量化对现代建筑由于搜索空间大而具有挑战性。在这项工作中,我们提出了量化意识培训的学习方法有局限性。我们将低精度监督学习作为有限的优化问题,并表明尽管其不兼容性强,由此产生的问题具有双重性,而且与梯度估计无关。此外,我们表明,双重变量表明目标对于制约性扰动的敏感性。我们证明,拟议方法在图像分类任务中表现出有竞争力的性能,并利用敏感性结果根据双重变量的价值应用层选择性量化,导致显著的性能改进。