自我注意力解释力的人类响应实验协议评估 (Evaluating self-attention interpretability through human-grounded experimental protocol)

Attention mechanisms have played a crucial role in the development of complex architectures such as Transformers in natural language processing. However, Transformers remain hard to interpret and are considered as black-boxes. This paper aims to assess how attention coefficients from Transformers can help in providing interpretability. A new attention-based interpretability method called CLaSsification-Attention (CLS-A) is proposed. CLS-A computes an interpretability score for each word based on the attention coefficient distribution related to the part specific to the classification task within the Transformer architecture. A human-grounded experiment is conducted to evaluate and compare CLS-A to other interpretability methods. The experimental protocol relies on the capacity of an interpretability method to provide explanation in line with human reasoning. Experiment design includes measuring reaction times and correct response rates by human subjects. CLS-A performs comparably to usual interpretability methods regarding average participant reaction time and accuracy. The lower computational cost of CLS-A compared to other interpretability methods and its availability by design within the classifier make it particularly interesting. Data analysis also highlights the link between the probability score of a classifier prediction and adequate explanations. Finally, our work confirms the relevancy of the use of CLS-A and shows to which extent self-attention contains rich information to explain Transformer classifiers.

翻译：摘要：注意力机制在自然语言处理中的Transformer等复杂体系结构的发展中发挥了关键作用。然而，Transformer仍然难以解释，被认为是黑盒子。本文旨在评估来自Transformer的注意力系数如何帮助提供解释性。提出了一种名为分类-注意力（CLS-A）的新型基于注意力的解释方法。CLS-A根据Transformer体系结构中特定于分类任务的部分相关的注意力系数分布计算每个词的解释性评分。进行了人类响应实验来评估和比较CLS-A与其他解释性方法。实验协议依赖于解释性方法能够提供符合人类推理的解释的能力。实验设计包括测量人类受试者的反应时间和正确率。CLS-A在平均参与者反应时间和准确性方面表现与通常的解释性方法相当。CLS-A相对于其他解释性方法的较低计算成本，并且由于设计在分类器中易于使用，因此特别有趣。数据分析还强调了分类器预测概率分数和适当解释之间的联系。最后，我们的工作证实了使用CLS-A的相关性，并展示自我注意力包含丰富的信息以解释Transformer分类器。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

专知会员服务

20+阅读 · 2021年12月12日

【EMNLP2021教程】鲁棒自然语言处理，EMNLP 21 Tutorial on Robust NLP，176页pdf

专知会员服务

35+阅读 · 2021年11月12日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日