3D object detection plays a significant role in various robotic applications including self-driving. While many approaches rely on expensive 3D sensors like LiDAR to produce accurate 3D estimates, stereo-based methods have recently shown promising results at a lower cost. Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space. However, because the two separate tasks are optimized in different metric spaces, the depth estimation is biased towards big objects and may cause sub-optimal performance of 3D detection. In this paper we propose a model that unifies these two tasks in the same metric space for the first time. Specifically, our model directly constructs a pseudo LiDAR feature volume (PLUME) in 3D space, which is used to solve both occupancy estimation and object detection tasks. PLUME achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
翻译:3D天体探测在包括自驾驶在内的各种机器人应用中起着重要作用。 虽然许多方法依赖LIDAR等昂贵的三维传感器来得出准确的三维估计值,但基于立体方法最近以较低的成本展示出令人乐观的结果。 现有方法分两步解决这个问题:首先进行深度估计,从深度估计中计算假的利DAR点云表,然后在3D空间进行物体探测。然而,由于两个不同的任务在不同计量空间中得到优化,深度估计偏向大物体,并可能导致三维探测的亚最佳性能。 在本文中,我们提出了一个模型,首次将这两项任务统一在同一计量空间。具体地说,我们的模型直接在3D空间建造了一个假的利DAR地物体量(PLUME),用于解决占用估计值和物体探测任务。 PLUME在具有挑战性能的KITTI基准上取得了最先进的性能,与现有方法相比,推论时间大大减少。