强力地物格级对称工具是可解释工具 (Robust Feature-Level Adversaries are Interpretability Tools)

The literature on adversarial attacks in computer vision typically focuses on pixel-level perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent representations of image generators to create "feature-level" adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks. We make three contributions. First, we observe that feature-level attacks provide useful classes of inputs for studying representations in models. Second, we show that these adversaries are versatile and highly robust. We demonstrate that they can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale. Third, we show how these adversarial images can be used as a practical interpretability tool for identifying bugs in networks. We use these adversaries to make predictions about spurious associations between features and classes which we then test by designing "copy/paste" attacks in which one natural image is pasted into another to cause a targeted misclassification. Our results suggest that feature-level attacks are a promising approach for rigorous interpretability research. They support the design of tools to better understand what a model has learned and diagnose brittle feature associations. Code is available at https://github.com/thestephencasper/feature_level_adv

翻译：关于计算机视觉中的对抗性攻击的文献通常侧重于像素级的扰动。这些情况往往很难解释。最近利用图像生成器的潜在显示方式来制造“ 功能级” 对抗性扰动, 使我们有机会探索可以理解的、可解释的对抗性攻击。我们做出三点贡献。首先, 我们观察到, 特征级攻击为研究模型中的演示提供了有用的投入类别。第二, 我们显示, 这些对手是多才多艺的, 并且非常强大。我们表明, 这些对手可以用来在图像网络的尺度上产生有针对性的、普遍、变相的、可实现的和黑盒攻击。第三, 我们展示如何将这些对抗性图像用作识别网络中的错误的实用可解释工具。我们利用这些对手来预测特征和类别之间的虚假联系,然后我们设计“ 复制/ 帕斯特” 攻击来测试这些特征和类别, 其中一个自然图像被粘贴到另一个模型, 导致有针对性的分类错误化。我们的结果表明, 地级攻击是一种很有希望的方法来进行严格的解释性的研究。第三, 我们展示这些对抗性图像的图像可用作工具的设计工具, 更好的理解网络中的模型/ 。 brestrealmax/ abrealmatial

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日