AI 欺骗性解释:创造和探测 (Deceptive AI Explanations: Creation and Detection)

Artificial intelligence (AI) comes with great opportunities but can also pose significant risks. Automatically generated explanations for decisions can increase transparency and foster trust, especially for systems based on automated predictions by AI models. However, given, e.g., economic incentives to create dishonest AI, to what extent can we trust explanations? To address this issue, our work investigates how AI models (i.e., deep learning, and existing instruments to increase transparency regarding AI decisions) can be used to create and detect deceptive explanations. As an empirical evaluation, we focus on text classification and alter the explanations generated by GradCAM, a well-established explanation technique in neural networks. Then, we evaluate the effect of deceptive explanations on users in an experiment with 200 participants. Our findings confirm that deceptive explanations can indeed fool humans. However, one can deploy machine learning (ML) methods to detect seemingly minor deception attempts with accuracy exceeding 80% given sufficient domain knowledge. Without domain knowledge, one can still infer inconsistencies in the explanations in an unsupervised manner, given basic knowledge of the predictive model under scrutiny.

翻译：人工智能(AI)带来巨大的机遇,但也可能构成重大风险。自动生成的决定解释可以增加透明度,促进信任,特别是基于AI模型自动预测的系统。然而,考虑到创建不诚实的AI的经济激励,我们可以在多大程度上相信解释?为了解决这个问题,我们的工作调查了AI模型(即深层学习,以及提高AI决定透明度的现有工具)如何能够用来创造和发现欺骗性解释。作为经验评估,我们侧重于文本分类,并改变GradCAM(神经网络中一种成熟的解释技术)产生的解释。然后,我们评估在200名参与者的实验中欺骗性解释对用户的影响。我们的调查结果证实,欺骗性解释确实可以愚弄人类。然而,我们可以运用机器学习(ML)方法,在有足够的领域知识的情况下,以超过80%的准确度探测表面上的欺骗企图。如果没有域知识,人们仍然可以推断出在解释中存在不统一的方式,因为对正在接受审查的预测模型具有基本知识。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日