Attention-based methods have played important roles in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs~(e.g., keywords in sentences). However, recent research found that attention-as-importance interpretations often do not work as we expected. For example, learned attention weights sometimes highlight less meaningful tokens like "[SEP]", ",", and ".", and are frequently uncorrelated with other feature importance indicators like gradient-based measures. A recent debate over whether attention is an explanation or not has drawn considerable interest. In this paper, we demonstrate that one root cause of this phenomenon is the combinatorial shortcuts, which means that, in addition to the highlighted parts, the attention weights themselves may carry extra information that could be utilized by downstream models after attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze combinatorial shortcuts, design one intuitive experiment to show their existence, and propose two methods to mitigate this issue. We conduct empirical studies on attention-based interpretation models. The results show that the proposed methods can effectively improve the interpretability of attention mechanisms.
翻译:以关注为基础的方法在模型解释中发挥了重要作用,在模型解释中,预计计算出的关注权重将突出投入的关键部分-(例如,句子中的关键词)。然而,最近的研究发现,注意即重要性的解释往往不能如我们所预期的那样发挥作用。例如,学习到的注意权重有时会突出“[SEP]”、“”和“”等不太有意义的象征物,而且往往与其他基于梯度的措施等重要特征指标不相干。最近就关注是否是一个解释性或没有引起很大兴趣的问题展开了一场辩论。在本文中,我们证明这一现象的根源之一是组合式的快捷方式,这意味着除了突出的部分外,注意权重本身可能包含下游模式在注意层之后可以使用的额外信息。因此,注意权重不再是纯粹的重要性指标。我们从理论上分析组合式的快捷方式,设计一种直观的试验来显示它们的存在,并提出缓解这一问题的两种方法。我们对基于关注性的解释模型进行经验研究。结果显示,拟议的方法可以有效地改善注意机制的可解释性。