FRAME: 评估自由文本理由说明的理性-标签-标签-一致度量度 (FRAME: Evaluating Rationale-Label Consistency Metrics for Free-Text Rationales)

Following how humans communicate, free-text rationales aim to use natural language to explain neural language model (LM) behavior. However, free-text rationales' unconstrained nature makes them prone to hallucination, so it is important to have metrics for free-text rationale quality. Existing free-text rationale metrics measure how consistent the rationale is with the LM's predicted label, but there is no protocol for assessing such metrics' reliability. Thus, we propose FRAME, a framework for evaluating rationale-label consistency (RLC) metrics for free-text rationales. FRAME is based on three axioms: (1) good metrics should yield highest scores for reference rationales, which maximize RLC by construction; (2) good metrics should be appropriately sensitive to semantic perturbation of rationales; and (3) good metrics should be robust to variation in the LM's task performance. Across three text classification datasets, we show that existing RLC metrics cannot satisfy all three FRAME axioms, since they are implemented via model pretraining which muddles the metric's signal. Then, we introduce a non-pretraining RLC metric that greatly outperforms baselines on (1) and (3), while performing competitively on (2). Finally, we discuss the limitations of using RLC to evaluate free-text rationales.

翻译：遵循人类的交流方式,自由文本的理论依据旨在使用自然语言解释神经语言模型(LM)的行为。然而,自由文本的理论依据的不受限制性质使得它们容易产生幻觉,因此重要的是要有自由文本理论质量的衡量标准。现有的自由文本理论依据衡量标准衡量其原理与LM预测的标签的一致程度,但没有评估这类指标可靠性的协议。因此,我们提议FRAME,一个用于评价理由标签一致性(RLC)自由文本原理的衡量标准的框架。 FRAME基于三个轴体系:(1) 良好的衡量标准应产生参考原理的最高分数,从而通过构建最大限度地增加RLC;(2) 良好的衡量标准应当适当敏感地测量自由理论的质量质量;(3) 良好的衡量标准应当强于LM任务绩效的变化。在三个文本分类数据集中,我们显示现有的RLC指标不能满足所有三种FRAME标准,因为它们是通过模范前训练执行的,该模范式是测量指标信号,通过建筑最大程度的RLC;(2) 然后,我们提出一个竞争性的基线,然后我们进行非训练性基准。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日