In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision. By computing the rank correlation between attention weights and the scores produced by a small sample of these methods, previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience. To investigate what measures of rank correlation can reliably conclude, we comprehensively compare feature-additive methods, including attention-based explanations, across several neural architectures and tasks. In most cases, we find that none of our chosen methods agree. Therefore, we argue that rank correlation is largely uninformative and does not measure the quality of feature-additive methods. Additionally, the range of conclusions a practitioner may draw from a single explainability algorithm are limited.
翻译:在自然语言处理中,特性补充解释方法量化了每种输入符号对模型决定的独立贡献。通过计算注意重量与这些方法中少量样本产生的分数之间的等级相关性,以前的分析试图使基于注意的解释的作用无效或支持其作用,将其作为一种可信和可信的显著度度度的衡量标准。为了调查何种等级相关性衡量标准可以可靠地得出结论,我们全面比较了多种神经结构和任务中的特性增加方法,包括基于注意的解释。在多数情况下,我们发现我们所选择的方法都无法达成一致。因此,我们认为,等级相关性在很大程度上不具有信息规范性,不能衡量基于特性的解释方法的质量。此外,从单一的解释算法中,从业人员可能得出的结论范围有限。