Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging. In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the `localization error' is the vital factor in restricting monocular 3D detection. Besides, we also investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies. First, we revisit the misalignment between the center of the 2D bounding box and the projected center of the 3D object, which is a vital factor leading to low localization accuracy. Second, we observe that accurately localizing distant objects with existing technologies is almost impossible, while those samples will mislead the learned network. To this end, we propose to remove such samples from the training set for improving the overall performance of the detector. Lastly, we also propose a novel 3D IoU oriented loss for the size estimation of the object, which is not affected by `localization error'. We conduct extensive experiments on the KITTI dataset, where the proposed method achieves real-time detection and outperforms previous methods by a large margin. The code will be made available at: https://github.com/xinzhuma/monodle.
翻译:从单体图像中估计三维捆绑框是自动驾驶的一个基本组成部分,而从这类数据中准确检测三维对象是极具挑战性的。在这项工作中,通过密集诊断实验,我们量化了每个子任务带来的影响,发现“定位错误”是限制单体3D探测的关键因素。此外,我们还调查定位错误背后的根本原因,分析它们可能引起的问题并提出三项战略。首先,我们重新审视二维捆绑框的中心与三维对象的预测中心之间的不匹配,这是导致本地化准确度低的一个关键因素。第二,我们观察到,用现有技术准确定位远方物体几乎是不可能的,而这些样本会误导学习网络。为此,我们提议从训练中去除这些样本,以改善探测器的总体性能。最后,我们还提议为天体的大小估计提出一个新的三维IOU方向损失,该天体不受“本地化错误”的影响。我们在KITzhTI数据集上进行了广泛的实验,这是导致本地化准确度低的一个关键因素。第二,我们观察到,用现有技术准确定位远方物体的定位几乎是不可能的,而这些样品将误导网络。为此,我们提议从培训中移除这些样本,以改进方法将改进了探测器/Mebasmamamamama 的校正法将实现实际探测/romamax。