具有自动复制/帕斯特攻击的深神经网络诊断 (Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks)

Deep neural networks (DNNs) are powerful, but they can make mistakes that pose significant risks. A model performing well on a test set does not imply safety in deployment, so it is important to have additional tools to understand its flaws. Adversarial examples can help reveal weaknesses, but they are often difficult for a human to interpret or draw generalizable, actionable conclusions from. Some previous works have addressed this by studying human-interpretable attacks. We build on these with three contributions. First, we introduce a method termed Search for Natural Adversarial Features Using Embeddings (SNAFUE) which offers a fully-automated method for finding "copy/paste" attacks in which one natural image can be pasted into another in order to induce an unrelated misclassification. Second, we use this to red team an ImageNet classifier and identify hundreds of easily-describable sets of vulnerabilities. Third, we compare this approach with other interpretability tools by attempting to rediscover trojans. Our results suggest that SNAFUE can be useful for interpreting DNNs and generating adversarial data for them. Code is available at https://github.com/thestephencasper/snafue

翻译：深心神经网络(DNNS)是强大的,但它们可以做出重大风险的错误。在测试集上表现良好的模型并不意味着部署的安全性,因此必须拥有更多工具来理解其缺陷。反向实例可以帮助揭示弱点, 但人类通常很难解释或得出一般可操作的结论。一些先前的著作已经通过研究人类解释的攻击来解决这个问题。我们以三点贡献为基础。首先, 我们引入了一种名为“ 利用嵌入仪搜索自然反向特征” 的方法, 这种方法提供一种完全自动化的方法来查找“ 复制/ 帕斯特” 攻击, 在这种攻击中, 一种自然图像可以粘贴到另一个攻击中, 以诱导不相关的错误分类。其次, 我们用这个方法来红一个图像网络分类器, 并识别成百套容易辨别的弱点。第三, 我们用这些方法与其他可解释工具进行比较, 试图重新发现热带卫星。我们的结果表明, SNAFUE可以用于解释 DNNPS, 并生成对抗性数据。代码可在 https://qubas/ comcastas。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日