There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary information to the other. However, we observe that current datasets are captured from expensive vehicles that are explicitly designed for data collection purposes, and cannot truly reflect the realistic data distribution due to various reasons. To this end, we collect a series of real-world cases with noisy data distribution, and systematically formulate a robustness benchmark toolkit, that simulates these cases on any clean autonomous driving datasets. We showcase the effectiveness of our toolkit by establishing the robustness benchmark on two widely-adopted autonomous driving datasets, nuScenes and Waymo, then, to the best of our knowledge, holistically benchmark the state-of-the-art fusion methods for the first time. We observe that: i) most fusion methods, when solely developed on these data, tend to fail inevitably when there is a disruption to the LiDAR input; ii) the improvement of the camera input is significantly inferior to the LiDAR one. We further propose an efficient robust training strategy to improve the robustness of the current fusion method. The benchmark and code are available at https://github.com/kcyu2014/lidar-camera-robust-benchmark
翻译:自动驾驶中有两个3D感知3D感知的关键传感器:摄像头和LiDAR。 相机提供丰富的语义信息, 如颜色、 纹理和LiDAR 反映周围物体的 3D 形状和位置。 人们发现, 使用这两种模式可以大大提升3D感知模型的性能, 因为每种模式都有互补信息。 然而, 我们观察到, 目前的数据集是从为数据收集目的明确设计的昂贵的车辆上采集的, 并且由于各种原因无法真正反映现实的数据分布。 为此, 我们收集了一系列真实世界案例, 数据分布很吵闹, 系统开发了一个强健基准工具包, 将这些案例模拟在任何清洁自主驾驶数据集上。 我们展示了我们工具的效用, 具体方法是在两种广泛采用的自主驱动数据集、 核Scenes 和Waymo 上建立强健健健的定位基准。 我们观察到: 多数聚合方法, 仅仅在这些数据上开发, 就会在任何干净的自动驾驶数据集中, 当我们能够大幅改进一个高健的A标准时, 。