A principle behind dozens of attribution methods is to take the prediction difference between before-and-after an input feature (here, a token) is removed as its attribution. A popular Input Marginalization (IM) method (Kim et al., 2020) uses BERT to replace a token, yielding more plausible counterfactuals. While Kim et al. (2020) reported that IM is effective, we find this conclusion not convincing as the DeletionBERT metric used in their paper is biased towards IM. Importantly, this bias exists in Deletion-based metrics, including Insertion, Sufficiency, and Comprehensiveness. Furthermore, our rigorous evaluation using 6 metrics and 3 datasets finds no evidence that IM is better than a Leave-One-Out (LOO) baseline. We find two reasons why IM is not better than LOO: (1) deleting a single word from the input only marginally reduces a classifier's accuracy; and (2) a highly predictable word is always given near-zero attribution, regardless of its true importance to the classifier. In contrast, making LIME samples more natural via BERT consistently improves LIME accuracy under several ROAR metrics.
翻译:数十种归因方法背后的一项原则是将输入特性(此处,象征性)前后的预测差异视为其属性。一种流行的投入边际化(IM)方法(Kim等人,2020年)使用BERT代替一个象征,产生更可信的反事实。Kim等人(2020年)报告说IM(IM)是有效的,但我们认为这一结论不能令人信服,因为其文件中使用的删除BERT衡量标准偏向IM。重要的是,这种偏向存在于基于删除的衡量标准中,包括插入、充足和全面性。此外,我们使用6度和3个数据集的严格评价没有发现任何证据表明IM(IM)比一个允许一出一出(LOOO)基线更好。我们发现IM(IM)比LO(LO)好两个原因:(1)从输入中删除一个单词只略微降低分类员的准确性;(2)一个非常可预测的词总是被给予近零归因,而不论其对分类员的真正重要性。相比之下,通过BERT(B)不断提高LME(LEME)样本的自然性。