Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to optimising an explanation's Faithfulness to a given model. Faithfulness is one of several so-called diagnostic properties, which prior work has identified as useful for gauging the quality of an explanation without requiring annotations. Other diagnostic properties are Data Consistency, which measures how similar explanations are for similar input instances, and Confidence Indication, which shows whether the explanation reflects the confidence of the model. In this work, we show how to directly optimise for these diagnostic properties when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reasoning tasks.
翻译:解释可以说明机器学习模型的原理,有助于找出其推理过程的缺陷。解释产生模型通常是以监督的方式培训的,提供人的解释。当没有这种说明时,解释往往被选为最大限度地发挥下游任务性能的投入部分,这相当于优化解释对特定模型的忠诚性。忠诚是若干所谓的诊断性特性之一,以前的工作已经确认这些特性有助于在不需要说明的情况下衡量解释的质量。其他诊断性特性是数据一致性,它衡量类似输入情况的解释如何相似,信任性说明,它显示解释是否反映了模型的信心。在这项工作中,我们展示了在培训模型以生成判决级解释时如何直接优化这些诊断性,这种解释明显地提高了解释质量,与人的理由一致,在三项复杂的推理任务上,下游任务表现如何。