Models that generate extractive rationales (i.e., subsets of features) or natural language explanations (NLEs) for their predictions are important for explainable AI. While an extractive rationale provides a quick view of the features most responsible for a prediction, an NLE allows for a comprehensive description of the decision-making process behind a prediction. However, current models that generate the best extractive rationales or NLEs often fall behind the state-of-the-art (SOTA) in terms of task performance. In this work, we bridge this gap by introducing RExC, a self-rationalizing framework that grounds its predictions and two complementary types of explanations (NLEs and extractive rationales) in background knowledge. Our framework improves over previous methods by: (i) reaching SOTA task performance while also providing explanations, (ii) providing two types of explanations, while existing models usually provide only one type, and (iii) beating by a large margin the previous SOTA in terms of quality of both types of explanations. Furthermore, a perturbation analysis in RExC shows a high degree of association between explanations and predictions, a necessary property of faithful explanations.
翻译:能够产生最佳采掘理由的模型(即地物子集)或自然语言解释的模型(NLEs),对于可解释的AI十分重要。虽然采掘理由对预测最负责的特征提供了快速的描述,但国家环境指标允许全面描述预测背后的决策过程;然而,目前产生最佳采掘理由或国家环境指标的模型在任务业绩方面往往落后于最先进的模型(SOTA)。在这项工作中,我们通过引入RExC来弥补这一差距。RExC是一个自我合理化的框架,其预测和背景知识中的两种补充性解释(NLEs和采掘理由)都有依据。我们的框架比以往的方法有所改进,其方法是:(一) 达到SOTA任务绩效,同时提供解释;(二) 提供两种解释,而现有的模型通常只提供一种类型,和(三) 在两种解释的质量方面,用一个很大的空间打击以前的SOTA。此外,RExC的渗透分析显示解释和预测之间的高度关联,一种必要的解释财产。