3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the $1^{st}$ place among all monocular-based methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the LiDAR Net. Our code will be released at \url{https://github.com/monster-ghost/MonoDistill}.
翻译:3D天体探测是3D场景理解的一项根本性和具有挑战性的任务,单星基方法可以作为立体或立体雷达方法的经济替代方法。然而,由于缺乏空间提示,很难从单一图像中准确探测三维空间中的物体。为缓解这一问题,我们提出了一个简单而有效的计划,将利DAR信号的空间信息引入单星3D探测器,而不在推断阶段引入任何额外的费用。特别是,我们首先将利DAR信号投射到图像平面上,并将其与RGB图像相匹配。之后,我们利用由此产生的数据以与基线模型相同的结构来培训三维探测器(利DAR Net) 。最后,利DAR 网络可以作为教师,将学到的知识转移到基线模型。实验结果表明,拟议方法可以大大提升基线模型的性能,并在KITTI基准的所有单星基方法中排名一美元。此外,还进行了广泛的反动研究,从而进一步证明了我们所学的模型的每个基准部分。