Following how humans communicate, free-text rationales aim to use natural language to explain neural language model (LM) behavior. However, free-text rationales' unconstrained nature makes them prone to hallucination, so it is important to have metrics for free-text rationale quality. Existing free-text rationale metrics measure how consistent the rationale is with the LM's predicted label, but there is no protocol for assessing such metrics' reliability. Thus, we propose FRAME, a framework for evaluating rationale-label consistency (RLC) metrics for free-text rationales. FRAME is based on three axioms: (1) good metrics should yield highest scores for reference rationales, which maximize RLC by construction; (2) good metrics should be appropriately sensitive to semantic perturbation of rationales; and (3) good metrics should be robust to variation in the LM's task performance. Across three text classification datasets, we show that existing RLC metrics cannot satisfy all three FRAME axioms, since they are implemented via model pretraining which muddles the metric's signal. Then, we introduce a non-pretraining RLC metric that greatly outperforms baselines on (1) and (3), while performing competitively on (2). Finally, we discuss the limitations of using RLC to evaluate free-text rationales.
翻译:遵循人类的交流方式,自由文本的理论依据旨在使用自然语言解释神经语言模型(LM)的行为。然而,自由文本的理论依据的不受限制性质使得它们容易产生幻觉,因此重要的是要有自由文本理论质量的衡量标准。现有的自由文本理论依据衡量标准衡量其原理与LM预测的标签的一致程度,但没有评估这类指标可靠性的协议。因此,我们提议FRAME,一个用于评价理由标签一致性(RLC)自由文本原理的衡量标准的框架。 FRAME基于三个轴体系:(1) 良好的衡量标准应产生参考原理的最高分数,从而通过构建最大限度地增加RLC;(2) 良好的衡量标准应当适当敏感地测量自由理论的质量质量;(3) 良好的衡量标准应当强于LM任务绩效的变化。在三个文本分类数据集中,我们显示现有的RLC指标不能满足所有三种FRAME标准,因为它们是通过模范前训练执行的,该模范式是测量指标信号,通过建筑最大程度的RLC;(2) 然后,我们提出一个竞争性的基线,然后我们进行非训练性基准。