An extractive rationale explains a language model's (LM's) prediction on a given task instance by highlighting the text inputs that most influenced the prediction. Ideally, rationale extraction should be faithful (reflective of LM's actual behavior) and plausible (convincing to humans), without compromising the LM's (i.e., task model's) task performance. Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework that generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (i.e., faithfulness and plausibility criteria); and (3) jointly the train task model and rationale extractor on the task using the selected objectives. UNIREX enables replacing prior works' heuristic design choices with a generic learned rationale extractor in (1) and optimizing it for all three desiderata in (2)-(3). To facilitate comparison between methods with respect to multiple desiderata, we introduce the Normalized Relative Gain (NRG) metric. Across five text classification datasets, our best UNIREX configuration outperforms baselines by an average of 32.9% NRG. Plus, we find that UNIREX-trained rationale extractors can even generalize to unseen datasets and tasks.
翻译:一种采掘理由解释对某个任务实例的语言模型(LM's)的预测,强调最能影响预测的文本输入。理想的情况是,理由提取应该忠实(反映LM的实际行为)和可信(对人类的可信),而不会损害LM(即任务模型)的任务性能。虽然归属算法和选择预测管道通常用于理由提取,但它们都依赖某些妨碍它们满足所有三个不同选择的超常设计选择。根据这一点,我们提议UNIREX,一个灵活学习框架,将理由提取优化概括如下:(1) 指定一个学习的理由提取器的结构;(2) 选择可解释性目标(即,忠诚和可观性标准);(3) 联合使用选定目标来完成任务的培训任务模型和理由提取器。UNIREX能够用一个一般的学习原理提取器取代先前的工作的超常设计选择,(1) 并在(2)(3) NIR.9 中对所有三个脱常态都进行优化。为了便于比较,我们用多层次的版本数据配置,我们可以采用一个最高级的版本的版本格式,我们采用一个最高级的版本的版本的版本的版本的版本的版本的版本。</s>