As a crucial task of autonomous driving, 3D object detection has made great progress in recent years. However, monocular 3D object detection remains a challenging problem due to the unsatisfactory performance in depth estimation. Most existing monocular methods typically directly regress the scene depth while ignoring important relationships between the depth and various geometric elements (e.g. bounding box sizes, 3D object dimensions, and object poses). In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection. Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised. We further implement and embed the proposed formula to enable geometry-aware deep representation learning, allowing effective 2D and 3D interactions for boosting the depth estimation. Moreover, we provide a strong baseline through addressing substantial misalignment between 2D annotation and projected boxes to ensure robust learning with the proposed geometric formula. Experiments on the KITTI dataset show that our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting. The model and code will be released at https://github.com/YinminZhang/MonoGeo.
翻译:作为自主驱动的关键任务,3D天体探测近年来取得了巨大进展。然而,单立体物体探测由于深度估计的性能不尽人意,单立体物体探测仍是一个具有挑战性的问题。大多数现有的单立体物体探测方法通常会直接反向场景深度,而忽略深度和各种几何元素(例如,捆绑箱大小、3D天体尺寸和天体构成)之间的重要关系。在本文件中,我们提议学习几何引导深度估计,用投影模型来推进单立体3D天体探测。具体而言,在单立体3D天体探测网络中,一个有2D和3D深度预测的投影模型的有原则的几何方公式已经设计出。我们进一步实施和嵌入了拟议公式,以便能够进行测深度的深度学习,同时忽略了深度和各种几何元素(例如,捆绑箱大小、3D天体尺寸和天体外尺寸)之间的相互作用。此外,我们通过处理2D注和预测箱之间的严重偏差,确保用拟议的几何公式进行稳健的学习。在KITTI数据集上的实验表明,我们的方法将明显改进了以正态/正态/正态/正态/透视/透视系统解法将用2.80号进行试验/自动解。