Attention-based methods have played an important role in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs (e.g., keywords in sentences). However, recent research points out that attention-as-importance interpretations often do not work as well as we expect. For example, learned attention weights sometimes highlight less meaningful tokens like "[SEP]", ",", and ".", and are frequently uncorrelated with other feature importance indicators like gradient-based measures. Finally, a debate on the effectiveness of attention-based interpretations has been raised. In this paper, we reveal that one root cause of this phenomenon can be ascribed to the combinatorial shortcuts, which stands for that in addition to the highlighted parts, the attention weights themselves may carry extra information which could be utilized by downstream models of attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze the combinatorial shortcuts, design one intuitive experiment to demonstrate their existence, and propose two methods to mitigate this issue. Empirical studies on attention-based interpretation models are conducted, and the results show that the proposed methods can effectively improve the interpretability of attention mechanisms on a variety of datasets.
翻译:以关注为基础的方法在模型解释中发挥了重要作用,在模型解释中,预计计算出的关注量会突出投入的关键部分(如句子中的关键词)。然而,最近的研究表明,注意与重要性的解释往往不及我们预期的那样有效。例如,所学的注意量有时会突出“[SEP]”、“”和“”等不太有意义的符号,而且往往与梯度措施等其他特征重要指标不相干。最后,提出了关于基于关注的解释的有效性的辩论。在本文中,我们揭示了这一现象的一个根源可以归结于组合式的捷径,除了突出的部分之外,注意与重要性本身可能包含额外的信息,而下游关注层模式可以使用这些信息。因此,注意的份量不再是纯重要性指标。我们从理论上分析组合式的捷径,设计一种直观的实验来证明它们的存在,并提出缓解这一问题的两种方法。对基于关注的解释模型进行了实证性研究,对基于关注的模型进行了多样化机制进行了解释,结果显示可有效地改进所提议的方法。