通过信仰违约测试重新思考关注模式可解释性 (Rethinking Attention-Model Explainability through Faithfulness Violation Test)

Attention mechanisms are dominating the explainability of deep models. They produce probability distributions over the input, which are widely deemed as feature-importance indicators. However, in this paper, we find one critical limitation in attention explanations: weakness in identifying the polarity of feature impact. This would be somehow misleading -- features with higher attention weights may not faithfully contribute to model predictions; instead, they can impose suppression effects. With this finding, we reflect on the explainability of current attention-based techniques, such as Attentio$\odot$Gradient and LRP-based attention explanations. We first propose an actionable diagnostic methodology (henceforth faithfulness violation test) to measure the consistency between explanation weights and the impact polarity. Through the extensive experiments, we then show that most tested explanation methods are unexpectedly hindered by the faithfulness violation issue, especially the raw attention. Empirical analyses on the factors affecting violation issues further provide useful observations for adopting explanation methods in attention models.

翻译：关注机制正在主导深层模型的可解释性,它们产生投入的概率分布,被广泛视为特征重要性指标。然而,在本文中,我们发现关注解释中有一个关键限制:在确定特征影响的极性方面软弱无力。这将在某种程度上产生误导 -- -- 关注权重较高的特征可能无法忠实地促进模型预测;相反,它们可以施加抑制效应。通过这一发现,我们思考当前基于关注的技术,如Attentio$\odot$Gradient和基于LRP的注意解释的可解释性。我们首先提出了一种可操作的诊断方法(因此的违反诚信测试),以衡量解释权与影响极性之间的一致性。我们随后通过广泛的实验,表明大多数经过测试的解释方法都出乎意料地受到违反诚信问题、特别是原始关注的阻碍。对影响侵权问题的各种因素的实证分析为在关注模型中采用解释方法提供了有益的意见。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日