Free-text rationales aim to explain neural language model (LM) behavior more flexibly and intuitively via natural language. To ensure rationale quality, it is important to have metrics for measuring rationales' faithfulness (reflects LM's actual behavior) and plausibility (convincing to humans). All existing free-text rationale metrics are based on simulatability (association between rationale and LM's predicted label), but there is no protocol for assessing such metrics' reliability. To investigate this, we propose FRAME, a framework for evaluating free-text rationale simulatability metrics. FRAME is based on three axioms: (1) good metrics should yield highest scores for reference rationales, which maximize rationale-label association by construction; (2) good metrics should be appropriately sensitive to semantic perturbation of rationales; and (3) good metrics should be robust to variation in the LM's task performance. Across three text classification datasets, we show that existing simulatability metrics cannot satisfy all three FRAME axioms, since they are implemented via model pretraining which muddles the metric's signal. We introduce a non-pretraining simulatability variant that improves performance on (1) and (3) by an average of 41.7% and 42.9%, respectively, while performing competitively on (2).
翻译:自由文本的理由陈述旨在更灵活和直观地解释自然语言的神经语言模型(LM)行为。为了确保理由陈述的质量,重要的是要有衡量理由陈述的忠诚性(反映LM的实际行为)和可信赖性(对人类的可信赖性)的衡量标准。所有现有的自由文本理由陈述都基于可互容性(理由与LM预测标签之间的关联),但是没有评估这类指标可靠性的协议。为了调查这一点,我们建议FRAME,一个评价自由文本理由模拟指标的框架。 FRAME基于三个轴线:(1)好的衡量标准应产生最高分数来衡量理由的忠诚性(反映LM的实际行为)和可信赖性(对人类的可信赖性);(2)好的衡量标准应适当敏感地反映理由的语义扭曲性;(3)好的衡量标准应强于LM任务性的变化性。在三个文本分类数据集中,我们建议FRAMEA,一个评价自由文本理由陈述的可比性指标框架。FRAME As, 三个轴质性指标基于三个轴数。FRAME Aseximom mission 。FRisimomimommilling suprestrain press abrestrain salstrain sal press a press a press press a press silvapressal 4revation silvastrevolviolviolviolviolviolviolviubiltaltaltibiltal 4),因为我们我们采用一种标准,因为我们我们采用一种标准,我们采用一种标准,我们采用一种标准,我们采用一种标准前的衡量标准,我们采用一种标准,我们采用一种标准,而采用一种标准前制制模模模模制的性模型。