Robust 3D object detection is critical for safe autonomous driving. Camera and radar sensors are synergistic as they capture complementary information and work well under different environmental conditions. Fusing camera and radar data is challenging, however, as each of the sensors lacks information along a perpendicular axis, that is, depth is unknown to camera and elevation is unknown to radar. We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. To leverage radar range measurements for better camera depth predictions, we propose a novel ray-constrained cross-attention mechanism that resolves the ambiguity in the geometric correspondences between camera features and radar features. Our method supports training with sensor modality dropout, which leads to robust 3D~object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle. We demonstrate the effectiveness of our fusion approach through extensive experiments on the RADIATE dataset, one of the few large-scale datasets that provide radar radio frequency imagery. A camera-only variant of our method achieves competitive performance in monocular 3D~object detection on the Waymo Open Dataset.
翻译:3D物体探测对安全自主驾驶至关重要。 相机和雷达传感器具有协同效应,因为它们捕捉补充信息,在不同的环境条件下运作良好。 但是,使用相机和雷达数据具有挑战性,因为每个传感器在垂直轴上缺乏信息,也就是说,摄像的深度是未知的,升空是雷达所不知道的。 我们建议使用摄像雷达匹配网络CramNet,这是将3D联合空间的摄像和雷达读数连接起来的有效方法。为了利用雷达测距测量,更好地进行摄像深度预测,我们建议建立一个新型的、由射线控制的交叉注意机制,解决相机特征和雷达特征之间几何对应的模糊性。我们的方法支持以传感器模式丢弃进行训练,从而导致3D粒子探测,即使对车辆的相机或雷达传感器突然发生故障。我们通过对RADIATE数据集进行广泛试验来展示我们的聚变方法的有效性。 RADIATE数据集是提供雷达无线电频率图像的少数大型数据集之一。我们方法中仅使用摄像的反射器变换方法在单轴3D- 透射线上实现了。