There has been a growing interest in interpreting the underlying dynamics of Transformers. While self-attention patterns were initially deemed as the primary option, recent studies have shown that integrating other components can yield more accurate explanations. This paper introduces a novel token attribution analysis method that incorporates all the components in the encoder block and aggregates this throughout layers. Through extensive quantitative and qualitative experiments, we demonstrate that our method can produce faithful and meaningful global token attributions. Our experiments reveal that incorporating almost every encoder component results in increasingly more accurate analysis in both local (single layer) and global (the whole model) settings. Our global attribution analysis significantly outperforms previous methods on various tasks regarding correlation with gradient-based saliency scores. Our code is freely available at https://github.com/mohsenfayyaz/GlobEnc.
翻译:人们对解释变异器基本动态的兴趣日益浓厚。虽然最初将自我注意模式视为首要选择,但最近的研究表明,整合其他组成部分可以产生更准确的解释。本文介绍了一种新颖的象征性归属分析方法,将编码器块中的所有组成部分都纳入其中,并汇集了各个层次。通过广泛的定量和定性实验,我们证明我们的方法可以产生忠实和有意义的全球象征属性。我们的实验表明,几乎每一个编码元件都包含在地方(单一层)和全球(整个模型)环境中进行越来越准确的分析。我们的全球归属分析大大优于以往关于与梯度显著分数相关的各种任务的方法。我们的代码可以在https://github.com/mohsenfayaz/GlobEnc上自由查阅。