Recent research on model interpretability in natural language processing extensively uses feature scoring methods for identifying which parts of the input are the most important for a model to make a prediction (i.e. explanation or rationale). However, previous research has shown that there is no clear best scoring method across various text classification tasks while practitioners typically have to make several other ad-hoc choices regarding the length and the type of the rationale (e.g. short or long, contiguous or not). Inspired by this, we propose a simple yet effective and flexible method that allows selecting optimally for each data instance: (1) a feature scoring method; (2) the length; and (3) the type of the rationale. Our method is inspired by input erasure approaches to interpretability which assume that the most faithful rationale for a prediction should be the one with the highest difference between the model's output distribution using the full text and the text after removing the rationale as input respectively. Evaluation on four standard text classification datasets shows that our proposed method provides more faithful, comprehensive and highly sufficient explanations compared to using a fixed feature scoring method, rationale length and type. More importantly, we demonstrate that a practitioner is not required to make any ad-hoc choices in order to extract faithful rationales using our approach.
翻译:最近对自然语言处理的模型解释性的研究广泛使用评分方法,确定投入的哪些部分对预测模型来说最重要(即解释或理由理由),但先前的研究显示,在各种文本分类任务中,没有明确的最佳评分方法,而从业者通常必须对理由的长度和类型作出其他几个特别选择(例如短或长、毗连或非),因此,我们提出了一个简单、有效和灵活的方法,以便能够为每个数据实例作出最佳选择:(1) 特征评分方法;(2) 长度;和(3) 理由类型。我们的方法受到投入取消解释性方法的启发,这些方法假定,最忠实的预测理由应该是使用全文在模型输出分布和分别删除理由作为投入的文本之间差别最大的理由。对四个标准文本分类数据集的评价表明,与使用固定特征评分方法、理由长度和类型相比,我们拟议的方法提供了更加准确、全面和充分的解释。更重要的是,我们证明,从业者不需要使用任何忠实的理由选择来提取。