为什么注意力不能被解释? (Why Attentions May Not Be Interpretable?)

Attention-based methods have played important roles in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs~(e.g., keywords in sentences). However, recent research found that attention-as-importance interpretations often do not work as we expected. For example, learned attention weights sometimes highlight less meaningful tokens like "[SEP]", ",", and ".", and are frequently uncorrelated with other feature importance indicators like gradient-based measures. A recent debate over whether attention is an explanation or not has drawn considerable interest. In this paper, we demonstrate that one root cause of this phenomenon is the combinatorial shortcuts, which means that, in addition to the highlighted parts, the attention weights themselves may carry extra information that could be utilized by downstream models after attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze combinatorial shortcuts, design one intuitive experiment to show their existence, and propose two methods to mitigate this issue. We conduct empirical studies on attention-based interpretation models. The results show that the proposed methods can effectively improve the interpretability of attention mechanisms.

翻译：以关注为基础的方法在模型解释中发挥了重要作用,在模型解释中,预计计算出的关注权重将突出投入的关键部分-(例如,句子中的关键词)。然而,最近的研究发现,注意即重要性的解释往往不能如我们所预期的那样发挥作用。例如,学习到的注意权重有时会突出“[SEP]”、“”和“”等不太有意义的象征物,而且往往与其他基于梯度的措施等重要特征指标不相干。最近就关注是否是一个解释性或没有引起很大兴趣的问题展开了一场辩论。在本文中,我们证明这一现象的根源之一是组合式的快捷方式,这意味着除了突出的部分外,注意权重本身可能包含下游模式在注意层之后可以使用的额外信息。因此,注意权重不再是纯粹的重要性指标。我们从理论上分析组合式的快捷方式,设计一种直观的试验来显示它们的存在,并提出缓解这一问题的两种方法。我们对基于关注性的解释模型进行经验研究。结果显示,拟议的方法可以有效地改善注意机制的可解释性。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CVPR2020-CUHK】探索和利用GANs中的可解释语义，60页ppt，Exploring and Exploiting Interpretable Semantics in GANs

专知会员服务

13+阅读 · 2020年6月18日

《可解释的机器学习-interpretable-ml》238页pdf

专知会员服务

208+阅读 · 2020年2月24日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日