Self-attention weights and their transformed variants have been the main source of information for analyzing token-to-token interactions in Transformer-based models. But despite their ease of interpretation, these weights are not faithful to the models' decisions as they are only one part of an encoder, and other components in the encoder layer can have considerable impact on information mixing in the output representations. In this work, by expanding the scope of analysis to the whole encoder block, we propose Value Zeroing, a novel context mixing score customized for Transformers that provides us with a deeper understanding of how information is mixed at each encoder layer. We demonstrate the superiority of our context mixing score over other analysis methods through a series of complementary evaluations with different viewpoints based on linguistically informed rationales, probing, and faithfulness analysis.
翻译:自我注意权重及其变异物一直是分析以变异器为基础的模型中象征性对口互动的主要信息来源。 但是,尽管这些权重容易解释,但它们并不忠实于模型的决定,因为它们只是编码器的一部分,而编码器层中的其他组成部分可能对产出表述中的信息混合产生相当大的影响。 在这项工作中,通过将分析范围扩大到整个编码器块,我们提出了价值零比值,这是为变异器定制的新背景组合分数,它使我们更深入地了解信息在每个编码层是如何混合的。我们通过一系列基于语言知情原理、检验和忠诚分析的不同观点的补充评价,显示了我们的背景比其他分析方法混合得分的优势。