In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness. In this work, for the first time, we can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. On top of this, we implement a hessian-free method with a model faithfulness guarantee. Finally, to compare our method with the others, we propose a semantic-based evaluation metric that can better align with humans' judgment of explanations than the widely adopted diagnostic or re-training measures. The empirical results on multiple real data sets demonstrate the proposed method's superior performance to popular explanation techniques such as Influence Function or TracIn on semantic evaluation.
翻译:在最近自然语言处理的进展中,最先进的模型和数据集的规模通常很广,对在很多方面,例如解释性、效率和忠诚性等应用基于样本的解释方法提出了挑战。在这项工作中,我们第一次可以通过允许任意的文字顺序作为解释单位来改进解释的可解释性。此外,我们实施了一种无黑森法,并有示范忠诚保证。最后,为了将我们的方法与其他方法进行比较,我们提出了一种基于语义的评价指标,它比广泛采用的诊断或再培训措施更符合人类的解释判断。多套实际数据的经验结果显示,拟议方法优于流行的解释技术,如影响函数或语义评价的TracIn。