Explaining the predictions of AI models is paramount in safety-critical applications, such as in legal or medical domains. One form of explanation for a prediction is an extractive rationale, i.e., a subset of features of an instance that lead the model to give its prediction on that instance. For example, the subphrase ``he stole the mobile phone'' can be an extractive rationale for the prediction of ``Theft''. Previous works on generating extractive rationales usually employ a two-phase model: a selector that selects the most important features (i.e., the rationale) followed by a predictor that makes the prediction based exclusively on the selected features. One disadvantage of these works is that the main signal for learning to select features comes from the comparison of the answers given by the predictor to the ground-truth answers. In this work, we propose to squeeze more information from the predictor via an information calibration method. More precisely, we train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. The first model is used as a guide for the second model. We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features. In addition, for natural language tasks, we propose a language-model-based regularizer to encourage the extraction of fluent rationales. Experimental results on a sentiment analysis task, a hate speech recognition task as well as on three tasks from the legal domain show the effectiveness of our approach to rationale extraction.
翻译:解释AI模型的预测在安全关键应用中至关重要,例如在法律或医疗领域。 预测的一种解释形式是提取原理, 即一个导出模型预测的示例的子集。 例如, 子句“ 他偷了移动电话' ” 可以是预测“ ft' ” 的提取原理。 先前关于产生采掘原理的作品通常使用一个两阶段模型: 选择最重要的语言( 即理由) 的选取者, 之后有一个预测者, 使预测完全基于选定特征。 这些作品的一个缺点是, 用于学习选择特征的主要信号来自对预测者给出的答案的比较到地面图解答案。 在这项工作中, 我们提议通过信息校准方法从预测者那里获取更多信息。 更确切地说, 我们联合培训了两个模型: 一个典型的神经模型, 以准确但黑箱方式解决了任务, 而另一个是定期预测结果的选取- 预言结果, 作为常规的模型的模型, 使用一种自然推导法, 一种我们用来推算的方法 。