何时以及如何愚弄可解释的模型(和人), (When and How to Fool Explainable Models (and Humans) with Adversarial Examples) - 专知论文

会员服务 ·

0

样例 · MoDELS · Machine Learning · 学成 · 可辨认的 ·

2021 年 7 月 5 日

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

翻译：何时以及如何愚弄可解释的模型(和人),

Jon Vadillo,Roberto Santana,Jose A. Lozano

from arxiv, 12 pages, 1 figure

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this paper, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing novel attack paradigms. In particular, our framework considers a wide range of relevant (yet often ignored) factors such as the type of problem, the user expertise or the objective of the explanations in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). These contributions intend to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.

翻译：由于若干限制,可靠地部署神经网络等机器学习模型仍然具有挑战性,其中的一些主要缺点是缺乏解释性,对对抗性实例或分配外的投入缺乏强力。在本文件中,我们探讨了可解释的机器学习模型的对抗性攻击的可能性和限度。首先,我们扩展了对抗性例子的概念,以适应可解释的机器学习假设,在这种假设中,投入、产出分类和模型决定的解释由人来评估。接着,我们提议了一个全面框架,研究是否可以(以及如何)产生对抗性例子,用于在人类评估中解释可解释的模式,引入新的攻击范式。特别是,我们的框架考虑了一系列相关的(经常被忽视的)因素,如问题的类型、用户专门知识或解释的目的,以便确定在每个假设中应当采用的攻击战略,成功地欺骗模型(和人)。这些贡献的目的是要作为基础,对可解释的机器学习领域的对抗性例子进行更加严格和现实的研究。

0

相关内容

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

全球首例！Adversarial T-shirt让你在AI目标检测系统中隐身

全球首例！Adversarial T-shirt让你在AI目标检测系统中隐身

CVer

4+阅读 · 2020年7月7日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Arxiv

0+阅读 · 2021年9月6日

Training Meta-Surrogate Model for Transferable Adversarial Attack

Arxiv

1+阅读 · 2021年9月5日

Advances in adversarial attacks and defenses in computer vision: A survey

Arxiv

22+阅读 · 2021年9月2日

Taking Over the Stock Market: Adversarial Perturbations Against Algorithmic Traders

Arxiv

0+阅读 · 2021年9月2日

Backdoor Learning: A Survey

Arxiv

14+阅读 · 2020年10月26日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Arxiv

17+阅读 · 2019年10月9日

Generative Adversarial Networks: A Survey and Taxonomy

Generative Adversarial Networks: A Survey and Taxonomy

Arxiv

14+阅读 · 2019年6月4日

Generating Adversarial Examples with Adversarial Networks

Arxiv

10+阅读 · 2018年1月15日

Explaining and Harnessing Adversarial Examples

Arxiv

4+阅读 · 2015年3月20日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

全球首例！Adversarial T-shirt让你在AI目标检测系统中隐身

全球首例！Adversarial T-shirt让你在AI目标检测系统中隐身

CVer

4+阅读 · 2020年7月7日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Arxiv

0+阅读 · 2021年9月6日

Training Meta-Surrogate Model for Transferable Adversarial Attack

Arxiv

1+阅读 · 2021年9月5日

Advances in adversarial attacks and defenses in computer vision: A survey

Arxiv

22+阅读 · 2021年9月2日

Taking Over the Stock Market: Adversarial Perturbations Against Algorithmic Traders

Arxiv

0+阅读 · 2021年9月2日

Backdoor Learning: A Survey

Arxiv

14+阅读 · 2020年10月26日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Arxiv

17+阅读 · 2019年10月9日

Generative Adversarial Networks: A Survey and Taxonomy

Generative Adversarial Networks: A Survey and Taxonomy

Arxiv

14+阅读 · 2019年6月4日

Generating Adversarial Examples with Adversarial Networks

Arxiv

10+阅读 · 2018年1月15日

Explaining and Harnessing Adversarial Examples

Arxiv

4+阅读 · 2015年3月20日

微信扫码咨询专知VIP会员