Robust 3D object detection is critical for safe autonomous driving. Camera and radar sensors are synergistic as they capture complementary information and work well under different environmental conditions. Fusing camera and radar data is challenging, however, as each of the sensors lacks information along a perpendicular axis, that is, depth is unknown to camera and elevation is unknown to radar. We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. To leverage radar range measurements for better camera depth predictions, we propose a novel ray-constrained cross-attention mechanism that resolves the ambiguity in the geometric correspondences between camera features and radar features. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle. We demonstrate the effectiveness of our fusion approach through extensive experiments on the RADIATE dataset, one of the few large-scale datasets that provide radar radio frequency imagery. A camera-only variant of our method achieves competitive performance in monocular 3D object detection on the Waymo Open Dataset.
翻译:3D 物体探测对安全自主驾驶至关重要。 相机和雷达传感器具有协同效应,因为它们捕捉补充信息,在不同的环境条件下运作良好。 但是,使用相机和雷达数据具有挑战性,因为每个传感器在垂直轴上缺乏信息,也就是说,摄像的深度是未知的,高度是雷达所不知道的。 我们建议使用摄像雷达匹配网络CramNet,这是将3D联合空间的摄像和雷达读数连接起来的有效方法。为了利用雷达测距测量,更好地预测摄像深度,我们提议了一种新颖的由射线控制的交叉注意机制,用以解决相机特征和雷达特征之间几何对应的模糊性。我们的方法支持以传感器模式丢弃物体的方式进行培训,从而导致3D 物体的强性探测,即使车辆的摄像或雷达传感器突然发生故障。我们通过对RADIATE数据集进行广泛实验来展示我们的聚变方法的有效性。 RADIATE数据集是提供雷达无线电频率图像的少数大型数据集之一。我们方法中仅使用摄像器的变换式方法在路径上的3D物体探测中取得了竞争性的立心3D物体探测功能。