Deep neural networks have become the driving force of modern image recognition systems. However, the vulnerability of neural networks against adversarial attacks poses a serious threat to the people affected by these systems. In this paper, we focus on a real-world threat model where a Man-in-the-Middle adversary maliciously intercepts and perturbs images web users upload online. This type of attack can raise severe ethical concerns on top of simple performance degradation. To prevent this attack, we devise a novel bi-level optimization algorithm that finds points in the vicinity of natural images that are robust to adversarial perturbations. Experiments on CIFAR-10 and ImageNet show our method can effectively robustify natural images within the given modification budget. We also show the proposed method can improve robustness when jointly used with randomized smoothing.
翻译:深神经网络已成为现代图像识别系统的驱动力。 然而,神经网络在对抗性攻击面前的脆弱性对受这些系统影响的人构成了严重威胁。 在本文中,我们侧重于现实世界威胁模型,即中中方敌人恶意拦截和扰动网络用户图像上传到网上。这种攻击除了简单的性能退化外,还会引起严重的伦理问题。为了防止这种攻击,我们设计了一个新的双级优化算法,在自然图像附近找到强于对抗性干扰的点。对CIFAR-10和图像网络的实验显示,我们的方法可以在给定的修改预算范围内有效地巩固自然图像。我们还展示了在与随机平滑时联合使用的拟议方法能够提高稳健性。