概率和几何深度:前景中的探测物体 (Probabilistic and Geometric Depth: Detecting Objects in Perspective)

3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results. This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem: The inaccurate instance depth blocks all the other 3D attribute predictions from improving the overall detection performance. However, recent methods directly estimate the depth based on isolated instances or pixels while ignoring the geometric relations across different objects, which can be valuable constraints as the key information about depth is not directly manifest in the monocular image. Therefore, we construct geometric relation graphs across predicted objects and use the graph to facilitate depth estimation. As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty. It provides an important indicator to identify confident predictions and further guide the depth propagation. Despite the simplicity of the basic idea, our method obtains significant improvements on KITTI and nuScenes benchmarks, achieving the 1st place out of all monocular vision-only methods while still maintaining real-time efficiency. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

翻译：3D对象探测是各种实际应用,例如驱动器协助系统所需要的一种重要能力。单体 3D 探测作为一种经济的解决方案,与依赖双筒望远镜或激光雷达的常规环境相比,最近引起越来越多的关注,但仍不能产生令人满意的结果。本文首先提出对这一问题的系统研究,并指出目前单体 3D 探测问题可以简化为实例深度估计问题: 不准确的试样深度将所有其他 3D 属性预测都从改进总体探测性能中隔开来。但是,最近的方法直接估计了以孤立的事例或像素为基础的深度,而忽略不同对象之间的几何关系,这可能成为宝贵的制约因素,因为有关深度的关键信息并不直接在单体图像中显示出来。因此,我们为预测对象建立几何关系图,并使用该图来便利深度估计。由于在这一错误的环境下,对每一种情况的初步深度估计通常不准确,因此我们采用了一种比较稳妥的表示来捕捉到不确定性。它提供了一个重要的指标,用以确定自信的预测和进一步指导深度传播。尽管基本想法很简洁,但我们的方法在直观的图像上取得了重大的改进,但我们的方法在KIT- 3号/ 和TRanununuclexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx