Neural network architectures in natural language processing often use attention mechanisms to produce probability distributions over input token representations. Attention has empirically been demonstrated to improve performance in various tasks, while its weights have been extensively used as explanations for model predictions. Recent studies (Jain and Wallace, 2019; Serrano and Smith, 2019; Wiegreffe and Pinter, 2019) have showed that it cannot generally be considered as a faithful explanation (Jacovi and Goldberg, 2020) across encoders and tasks. In this paper, we seek to improve the faithfulness of attention-based explanations for text classification. We achieve this by proposing a new family of Task-Scaling (TaSc) mechanisms that learn task-specific non-contextualised information to scale the original attention weights. Evaluation tests for explanation faithfulness, show that the three proposed variants of TaSc improve attention-based explanations across two attention mechanisms, five encoders and five text classification datasets without sacrificing predictive performance. Finally, we demonstrate that TaSc consistently provides more faithful attention-based explanations compared to three widely-used interpretability techniques.
翻译:自然语言处理中的神经网络结构往往使用注意机制来产生对投入象征性表述的概率分布。注意已经从经验上表明,可以改进各种任务的业绩,而其权重已被广泛用作模型预测的解释。最近的研究(Jain和Wallace,2019年;Serrano和Smith,2019年;Wiegreffe和Pinter,2019年)表明,通常不能把它视为对各种编码和任务的一种忠实的解释(Jacovi和Goldberg,2020年)。在本文件中,我们力求提高基于注意的解释对文本分类的忠实性。我们提出一套新的任务缩略微(Tasc)机制,以学习与任务相关的非书面信息,以扩大最初的注意权重。关于解释的评审测试显示,Tasc的三个拟议变式在不牺牲预测性性能的情况下,在五个编码和五个文本分类数据集中提高了关注性解释性。最后,我们证明TaSc一贯地提供更忠实的基于注意的解释,而与三种广泛使用的可解释性技术相比。