Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.
翻译:目标检测网络的高效推断是边缘设备上的一个重大挑战。后训练量化(PTQ)是一种有效且方便的方法,可以将高精度模型直接转化为低比特精度,以降低模型推断复杂度。但是,当应用于复杂任务,如目标检测时,会出现严重的精度下降问题。PTQ通过不同的度量优化量化参数,以最小化量化的扰动,其中特征映射的Lp范数距离通常用作评估扰动的度量。针对目标检测网络的专业性,我们观察到Lp范数度量中p参数将显着影响其量化性能。我们指出,使用固定的超参数p无法达到最佳的量化性能。为了缓解这个问题,我们提出一个框架,称为DetPTQ,用于为不同的层分配不同的p值,并使用目标检测输出损失(ODOL)作为度量方法,该方法表示目标检测任务的损失。DetPTQ采用基于ODOL的自适应Lp度量方法来选择最佳的量化参数。实验表明,我们的DetPTQ在2D和3D目标检测器上的性能均优于现有的PTQ方法。例如,在使用4位权重和4位激活的RetinaNet-ResNet18模型上,我们实现了31.1 / 31.7(量化/全精度)mAP。