概率和几何深度:前景中的探测物体 (Probabilistic and Geometric Depth: Detecting Objects in Perspective)

3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as a representative general setting among image-based approaches, provides a more economical solution than conventional settings relying on LiDARs. It has drawn increasing attention recently but still yields unsatisfactory results. This paper first presents a systematic study on this problem. We observe that the current monocular 3D detection can be simplified as an instance depth estimation problem: The inaccurate instance depth blocks all the other 3D attribute predictions from improving the overall detection performance. However, recent methods directly estimate the depth based on isolated instances or pixels while ignoring the geometric relations across different objects. These geometric relations can be valuable constraints as the key information about depth is not directly manifest in the monocular image. Therefore, we construct geometric relation graphs across predicted objects and use the graph to facilitate depth estimation. As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty. It provides an important indicator to identify confident predictions and further guide the depth propagation. Despite the simplicity of the basic idea, our method obtains significant improvements on KITTI and nuScenes benchmarks, achieving 1st place out of all monocular vision-only methods while still maintaining real-time efficiency. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

翻译：3D对象探测是各种实际应用,例如驱动器协助系统所需要的一种重要能力。单体 3D 探测,作为基于图像的方法中具有代表性的一般设置,提供了比依赖激光成像的常规设置更经济的解决方案。它最近引起越来越多的关注,但仍产生不令人满意的结果。本文首先提出对这一问题的系统研究。我们认为,当前单体 3D 探测可以简化为实例深度估计问题: 不准确的试样深度将所有其他3D属性预测从提高总体探测性能中分离出来。然而,最近的方法直接估计了孤立的事例或像素的深度,而忽略不同对象的几何关系。这些几何关系可能是宝贵的限制因素,因为关于深度的关键信息并非直接在单体图像中显示出来。因此,我们在预测对象之间建立几何关系图,并使用该图来帮助深度估计。由于对每例的初步深度估计通常不准确,因此我们采用了一种概率表示来捕捉不确定性。它提供了一个重要的指标,用以确定自信的预测,并进一步指导深度的传播。尽管基本想法的简单性,但我们的方法在真实性标准上将获得显著的改进, 并且在稳定性模型上采用。