Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices. However, designing aggressively low-bit (e.g., 2-bit) quantization schemes on complex tasks, such as object detection, still remains challenging in terms of severe performance degradation and unverifiable efficiency on common hardware. In this paper, we propose an Accurate Quantized object Detection solution, termed AQD, to fully get rid of floating-point computation. To this end, we target using fixed-point operations in all kinds of layers, including the convolutional layers, normalization layers, and skip connections, allowing the inference to be executed using integer-only arithmetic. To demonstrate the improved latency-vs-accuracy trade-off, we apply the proposed methods on RetinaNet and FCOS. In particular, experimental results on MS-COCO dataset show that our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes, which is of great practical value. Source code and models are available at: https://github.com/aim-uofa/model-quantization
翻译:网络定量化使得可以使用低精度算术进行推断,提高边缘设备深神经网络的推断效率;然而,在物体探测等复杂任务上设计大胆的低位(例如,2比位)量化办法,在性能严重退化和通用硬件无法核实效率方面,仍然具有挑战性。在本文件中,我们建议采用一个精确量化物体探测办法,称为AQD,以完全摆脱浮动点计算。为此,我们的目标是在各种层次上使用固定点操作,包括电流层、正常化层和跳过连接,允许使用仅使用整数算法执行推断。为了展示改进的拉特尼特-Vs-准确性交易,我们在RetinaNet 和 FCOS 上采用拟议的方法。特别是,关于MS-COCO数据集的实验结果显示,我们的AQD与极低比值方案下的全精度对应方相比,其性能相当或甚至更好。