Monocular 3D object detection is drawing increasing attention from the community as it enables cars to perceive the world in 3D with a single camera. However, monocular 3D detection currently struggles with extremely lower detection rates compared to LiDAR-based methods, limiting its applications. The poor accuracy is mainly caused by the absence of accurate depth cues due to the ill-posed nature of monocular imagery. LiDAR point clouds, which provide accurate depth measurement, can offer beneficial information for the training of monocular methods. Prior works only use LiDAR point clouds to train a depth estimator. This implicit way does not fully utilize LiDAR point clouds, consequently leading to suboptimal performances. To effectively take advantage of LiDAR point clouds, in this paper we propose a general, simple yet effective framework for monocular methods. Specifically, we use LiDAR point clouds to directly guide the training of monocular 3D detectors, allowing them to learn desired objectives meanwhile eliminating the extra annotation cost. Thanks to the general design, our method can be plugged into any monocular 3D detection method, significantly boosting the performance. In conclusion, we take the first place on KITTI monocular 3D detection benchmark and increase the BEV/3D AP from 11.88/8.65 to 22.06/16.80 on the hard setting for the prior state-of-the-art method. The code will be made publicly available soon.
翻译:显性 3D 物体探测正在引起社区越来越多的关注,因为它使汽车能够用一台照相机以3D 感知世界。然而,单眼 3D 探测目前与以激光雷达为基础的探测率相比,其探测率极低,限制了其应用。准确性差的主要原因是由于单镜图像的不正确性质而缺乏准确的深度信号。LIDAR点云提供准确的深度测量,可为单眼方法的培训提供有益的信息。以前的工作只能用激光雷达点云来训练深度测深仪。这一隐含方式没有充分利用激光雷达点云,从而导致不优化的性能。为了有效地利用激光雷达点云,我们在本文中提出了一个通用的、简单而有效的单眼方法框架。具体地说,我们利用激光雷达点云直接指导单眼 3D 探测器的培训,使他们能够学习预期的目标,同时消除额外说明费用。由于一般设计,我们的方法可以插入任何单眼 3D 3D 点探测方法,从而很快大大地提高性能。为了有效地利用激光雷达点云,我们提出了一个一般的BL65/3 16 标准 。我们从B 16 标准 将第一次提升B 。