Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a critical and challenging task for low-cost urban autonomous driving and mobile robots. Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation. We first identify how the ground plane provides additional clues in depth reasoning in 3D detection in driving scenes. Based on this observation, we then improve the processing of 3D anchors and introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning. Finally, we introduce an efficient neural network embedded with the proposed module for 3D object detection. We further verify the power of the proposed module with a neural network designed for monocular depth prediction. The two proposed networks achieve state-of-the-art performances on the KITTI 3D object detection and depth prediction benchmarks, respectively. The code will be published in https://www.github.com/Owen-Liuyuxuan/visualDet3D
翻译:以一个 RGB 相机估计环境中的物体的三维位置和方向是低成本城市自主驾驶和移动机器人的一项关键而艰巨的任务。现有的算法大多基于2D-3D通信中的几何限制,这些限制源于通用的 6D 对象构成估计。我们首先确定地面飞机如何在驾驶场对三维探测提供更深的推理线索。根据这一观察,我们随后改进了三维锚的处理,并引入了一个新的神经网络模块,以便在深层次学习的框架内充分利用这种具体应用的前科。最后,我们引入了与拟议的三维物体探测模块嵌入的高效神经网络。我们进一步核查拟议的模块的力量,并使用设计用于单眼深度预测的神经网络。两个拟议网络分别实现了KITTI 3D 对象探测和深度预测基准的状态性能。代码将在https://www.github.com/Owen-Luyuxuan/visalDD上公布。