Radar is usually more robust than the camera in severe driving scenarios, e.g., weak/strong lighting and bad weather. However, unlike RGB images captured by a camera, the semantic information from the radar signals is noticeably difficult to extract. In this paper, we propose a deep radar object detection network (RODNet), to effectively detect objects purely from the carefully processed radar frequency data in the format of range-azimuth frequency heatmaps (RAMaps). Three different 3D autoencoder based architectures are introduced to predict object confidence distribution from each snippet of the input RAMaps. The final detection results are then calculated using our post-processing method, called location-based non-maximum suppression (L-NMS). Instead of using burdensome human-labeled ground truth, we train the RODNet using the annotations generated automatically by a novel 3D localization method using a camera-radar fusion (CRF) strategy. To train and evaluate our method, we build a new dataset -- CRUW, containing synchronized videos and RAMaps in various driving scenarios. After intensive experiments, our RODNet shows favorable object detection performance without the presence of the camera.
翻译:雷达通常比在严重驱动情景下(例如,弱/强照明和恶劣天气)的相机更强大。然而,与摄像头捕获的 RGB 图像不同,雷达信号的语义信息显然很难提取。在本文中,我们提议建立一个深雷达物体探测网(RODNet),以便使用射频热测绘仪(RAMAPs)的形式,从经过仔细处理的雷达频率数据中完全有效地探测物体。有三个不同的基于3D 自动电解码的建筑,用来预测从输入的RAMP的每个片段中传播物体的可信度。最后检测结果随后使用我们的后处理方法计算,称为基于地点的非最大抑制(L-NMS)。我们不是使用沉重的人类标签地面真相,而是使用由3D新式的3D本地化方法自动生成的说明来训练RODNet。为了培训和评估我们的方法,我们采用了一个新的数据集 -- CRUW,包含同步的视频和各种驱动场景中的RAMap。经过密集的实验后,我们的RODNet显示可喜的物体探测性,而没有摄像机。