We propose DeepFusion, a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection. Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible. Extracted features are transformed into bird's-eye-view as a common representation for fusion. Spatial and semantic alignment is performed prior to fusing modalities in the feature space. Finally, a detection head exploits rich multi-modal features for improved 3D detection performance. Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach. In the process, we study the largely unexplored task of faraway car detection up to 225 meters, showing the benefits of our lidar-camera fusion. Furthermore, we investigate the required density of lidar points for 3D object detection and illustrate implications at the example of robustness against adverse weather conditions. Moreover, ablation studies on our camera-radar fusion highlight the importance of accurate depth estimation.
翻译:我们提议DhiepFusion, 这是一个模块化的多式结构,用于为3D物体探测而结合不同组合的激光器、照相机和雷达。 特殊地物提取器利用每种模式,并且可以容易地交换,使方法简单而灵活。 提取的特性被转化成鸟眼观,作为聚变的共同表示。 空间和语义调整在地物空间的引信模式之前进行。 最后, 探测头利用丰富的多式特征改进3D探测性能。 Lidar- camera、lidar- camera-radar和相机-radar的实验结果显示了我们聚变方法的灵活性和有效性。 在这一过程中, 我们研究远至225米的汽车探测未探索的任务, 展示了我们的激光摄像机- camera 聚变的好处。 此外, 我们调查了3D物体探测所需的利达尔点密度, 并展示了对恶劣天气条件的坚固度。 此外, 我们的相机- 雷达聚变异研究凸显了准确深度估计的重要性。