Transformers are widely used in natural language processing, where they consistently achieve state-of-the-art performance. This is mainly due to their attention-based architecture, which allows them to model rich linguistic relations between (sub)words. However, transformers are difficult to interpret. Being able to provide reasoning for its decisions is an important property for a model in domains where human lives are affected. With transformers finding wide use in such fields, the need for interpretability techniques tailored to them arises. We propose a new technique that selects the most faithful attention-based interpretation among the several ones that can be obtained by combining different head, layer and matrix operations. In addition, two variations are introduced towards (i) reducing the computational complexity, thus being faster and friendlier to the environment, and (ii) enhancing the performance in multi-label data. We further propose a new faithfulness metric that is more suitable for transformer models and exhibits high correlation with the area under the precision-recall curve based on ground truth rationales. We validate the utility of our contributions with a series of quantitative and qualitative experiments on seven datasets.
翻译:在自然语言处理中,变压器被广泛使用,在自然语言处理中,它们一贯地达到最先进的性能,这主要是由于它们以注意为基础的结构,使得它们能够模拟(子)字之间的丰富的语言关系。然而,变压器很难解释。能够为其决定提供推理是人类生活受到影响的领域模型的一个重要属性。变压器在这类领域广泛使用,因此需要量身定制的可解释技术。我们提出了一个新技术,在通过将不同的头部、层和矩阵操作结合起来可以获得的数种解释中选择最忠实的、以关注为基础的解释。此外,在(一) 降低计算复杂性,从而更快地与环境更加友好,以及(二) 提高多标签数据的性能方面,我们进一步提出一个新的忠诚度度度指标,更适合变压器模型,并显示与根据地面事实原理精确召回曲线下的区域高度相关。我们用一系列关于七个数据集的定量和定性实验来验证我们的贡献的效用。