Monocular 3D object detection is a critical yet challenging task for autonomous driving, due to the lack of accurate depth information captured by LiDAR sensors. In this paper, we propose a stereo-guided monocular 3D object detection network, termed SGM3D, which leverages robust 3D features extracted from stereo images to enhance the features learned from the monocular image. We innovatively investigate a multi-granularity domain adaptation module (MG-DA) to exploit the network's ability so as to generate stereo-mimic features only based on the monocular cues. The coarse BEV feature-level, as well as the fine anchor-level domain adaptation, are leveraged to guide the monocular branch. We present an IoU matching-based alignment module (IoU-MA) for object-level domain adaptation between the stereo and monocular predictions to alleviate the mismatches in previous stages. We conduct extensive experiments on the most challenging KITTI and Lyft datasets and achieve new state-of-the-art performance. Furthermore, our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
翻译:由于LIDAR传感器缺乏准确的深度信息,对自动驾驶来说,单体3D天体探测是一项关键但具有挑战性的任务。在本文中,我们提议建立一个名为SGM3D的立体导导单体3D天体探测网络,利用从立体图像中提取的3D强功能来增强从立体图像中学习的特征。我们创新地调查了一个多色域适应模块(MG-DA),以利用网络的能力,仅根据单体信号生成立体功能。粗略的BEV地貌水平以及精细的锚级域域适应被利用来引导单体形分支。我们提出了一个基于立体和立体图像的立体匹配定位模块(IoU-MA),用于立体和立体图像预测之间的对象级调整,以缓解先前阶段的不匹配。我们对最具挑战性能的KITTI和Lyft数据集进行了广泛的实验,并实现了新的状态性能。此外,我们的方法可以被纳入许多其他的单体推进性能的单体方法,而不引入任何额外的计算成本。