Most autonomous vehicles (AVs) rely on LiDAR and RGB camera sensors for perception. Using these point cloud and image data, perception models based on deep neural nets (DNNs) have achieved state-of-the-art performance in 3D detection. The vulnerability of DNNs to adversarial attacks have been heavily investigated in the RGB image domain and more recently in the point cloud domain, but rarely in both domains simultaneously. Multi-modal perception systems used in AVs can be divided into two broad types: cascaded models which use each modality independently, and fusion models which learn from different modalities simultaneously. We propose a universal and physically realizable adversarial attack for each type, and study and contrast their respective vulnerabilities to attacks. We place a single adversarial object with specific shape and texture on top of a car with the objective of making this car evade detection. Evaluating on the popular KITTI benchmark, our adversarial object made the host vehicle escape detection by each model type nearly 50% of the time. The dense RGB input contributed more to the success of the adversarial attacks on both cascaded and fusion models. We found that the fusion model was relatively more robust to adversarial attacks than the cascaded model.
翻译:大多数自主飞行器(AVs)依靠LIDAR和RGB相机传感器进行感知。使用这些点云和图像数据,基于深神经网的感知模型在3D探测中达到了最先进的性能。在RGB图像域和最近在点云域中,对DNNs易受对抗性攻击的脆弱性进行了大量调查,但很少同时在两个领域同时使用。AVs使用的多式感知系统可以分为两大类:独立使用每种模式的级联模型,以及同时从不同模式中学习的聚合模型。我们建议对每种类型进行普遍和实际可实现的对抗性攻击,并研究和比较它们各自的攻击弱点。我们在汽车顶部放置了一个带有特定形状和纹理的单一对抗性物体,目的是进行这种躲避性攻击。我们根据流行的KITTI基准,我们的对抗性物体使东道车辆在每种模式中都几乎50%的时间进行逃脱性探测。密集的RGB投入对各种类型的对抗性攻击的成功起到了更大的促进作用。我们发现,对级和联合式攻击比级模型都比较牢固。