Modern self-driving perception systems have been shown to improve upon processing complementary inputs such as LiDAR with images. In isolation, 2D images have been found to be extremely vulnerable to adversarial attacks. Yet, there have been limited studies on the adversarial robustness of multi-modal models that fuse LiDAR features with image features. Furthermore, existing works do not consider physically realizable perturbations that are consistent across the input modalities. In this paper, we showcase practical susceptibilities of multi-sensor detection by placing an adversarial object on top of a host vehicle. We focus on physically realizable and input-agnostic attacks as they are feasible to execute in practice, and show that a single universal adversary can hide different host vehicles from state-of-the-art multi-modal detectors. Our experiments demonstrate that successful attacks are primarily caused by easily corrupted image features. Furthermore, we find that in modern sensor fusion methods which project image features into 3D, adversarial attacks can exploit the projection process to generate false positives across distant regions in 3D. Towards more robust multi-modal perception systems, we show that adversarial training with feature denoising can boost robustness to such attacks significantly. However, we find that standard adversarial defenses still struggle to prevent false positives which are also caused by inaccurate associations between 3D LiDAR points and 2D pixels.
翻译:现代自我驱动感知系统在处理像 LiDAR 图像等辅助性投入时显示出了改进的现代自我驱动感知系统。 孤立地说, 2D 图像被发现极易受到对抗性攻击的伤害。 然而,对于将LIDAR 特征与图像特征结合在一起的多模式模型的对抗性强度研究有限。 此外, 现有工程并不认为在输入模式中能够实现的物理扰动。 在本文中, 我们展示了多感应器检测的实际可能性, 在主机上放置一个对称对象。 我们侧重于物理上可实现的和输入性攻击, 因为它们在实践上是可行的, 并且显示一个单一的通用敌人可以隐藏不同主机载工具的对抗性强度, 与最先进的多模式的多模式探测器。 此外, 我们的实验表明, 成功攻击主要是由易于腐蚀的图像特征造成的。 此外, 在将图像特性投射为 3D 的现代传感器中, 对抗性攻击可以利用预测过程在遥远的区域产生虚假的正反效果。 走向更强大的多式D, 我们发现, 更强大的多式D 认知系统可以大大地防止这种对抗性攻击, 我们的对立性练习导致 3D 的对立性训练 3 。