3D object detection plays a significant role in various robotic applications including self-driving. While many approaches rely on expensive 3D sensors like LiDAR to produce accurate 3D estimates, stereo-based methods have recently shown promising results at a lower cost. Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space. However, because the two separate tasks are optimized in different metric spaces, the depth estimation is biased towards nearby objects and may cause sub-optimal performance of 3D detection. In this paper we propose a model that unifies these two tasks in the same metric space. Specifically, our model directly constructs a pseudo LiDAR feature volume (PLUME) in 3D space, which is used to solve both occupancy estimation and object detection tasks. Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
翻译:3D物体探测在各种机器人应用(包括自驾驶)中起着重要作用。虽然许多方法依靠LIDAR等昂贵的3D传感器来得出准确的3D估计,但基于立体声音的方法最近以较低的成本展示出有希望的结果。现有方法分两步解决这个问题:首先进行深度估计,从深度估计中计算假的LIDAR点云表,然后在3D空间进行物体探测。然而,由于两个不同的任务在不同计量空间中最优化,深度估计偏向于附近物体,并可能导致3D探测的次优性能。在本文件中,我们提出了一个将这两项任务统一在同一计量空间的模型。具体地说,我们的模型直接在3D空间构建了一个假的LIDAR特征量(PLUME),用于解决占用估计和物体探测任务。我们的方法在具有挑战性的KITTI基准上达到了最先进的性能,与现有方法相比,推论时间大大缩短。