3D object detection with a single image is an essential and challenging task for autonomous driving. Recently, keypoint-based monocular 3D object detection has made tremendous progress and achieved great speed-accuracy trade-off. However, there still exists a huge gap with LIDAR-based methods in terms of accuracy. To improve their performance without sacrificing efficiency, we propose a sort of lightweight feature pyramid network called Lite-FPN to achieve multi-scale feature fusion in an effective and efficient way, which can boost the multi-scale detection capability of keypoint-based detectors. Besides, the misalignment between the classification score and the localization precision is further relieved by introducing a novel regression loss named attention loss. With the proposed loss, predictions with high confidence but poor localization are treated with more attention during the training phase. Comparative experiments based on several state-of-the-art keypoint-based detectors on the KITTI dataset show that our proposed method achieves significantly higher accuracy and frame rate at the same time. The code and pretrained models will be available at https://github.com/yanglei18/Lite-FPN.
翻译:以单一图像探测 3D 对象为单一图像是自动驾驶的基本和艰巨的任务。 最近, 以关键点为基础的单眼 3D 对象探测取得了巨大进展, 并实现了速度准确性交易。 然而, 与基于 LIDAR 的方法在准确性方面仍有巨大的差距。 为了在不牺牲效率的情况下提高它们的性能, 我们提议了一种称为Lite- FPN 的轻量级特征金字塔网络, 以便以有效和高效的方式实现多级特征聚合, 从而能够提高基于关键点的探测器的多级检测能力。 此外, 分类分和本地化精确度之间的误差通过引入新的回归损失来进一步缓解。 由于拟议的损失, 在培训阶段, 信心很高但本地化差的预测会得到更多关注。 基于KITTI 数据集上几个基于状态的键点探测器的比较实验显示, 我们拟议的方法能在同一时间达到相当高的精确度和框架率。 代码和预培训模型将在 https://github.com/yangle18/Lite-FP.