Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric, and with power-of-two thresholds. To the best of our knowledge, current post-training quantization methods do not support all of these constraints simultaneously. In this work, we introduce a hardware-friendly post training quantization (HPTQ) framework, which addresses this problem by synergistically combining several known quantization methods. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation over a wide variety of network architectures. Our extensive experiments show that competitive results can be obtained under hardware-friendly constraints.
翻译:神经网络量化使得能在边缘装置上部署模型。 对其硬件效率的一个基本要求是,量化器是硬件友好型的:统一、对称和两码阀值。 据我们所知,目前的培训后量化方法不能同时支持所有这些制约因素。在这项工作中,我们引入了一个硬件友好型后培训量化框架,通过协同结合几种已知的量化方法来解决这一问题。我们就四种任务进行了大规模研究:分类、物体探测、语义分割和对多种网络结构进行估计。我们的广泛实验表明,在硬件友好型限制下,可以取得竞争性的结果。