FRAME: 评估自由文本理由的模拟计量 (FRAME: Evaluating Simulatability Metrics for Free-Text Rationales) - 专知论文

会员服务 ·

0

Performer · 神经语言模型 · 语言模型化 · AIM · 预测值 ·

2022 年 7 月 2 日

FRAME: Evaluating Simulatability Metrics for Free-Text Rationales

翻译：FRAME: 评估自由文本理由的模拟计量

Aaron Chan,Shaoliang Nie,Liang Tan,Xiaochang Peng,Hamed Firooz,Maziar Sanjabi,Xiang Ren

from arxiv, 16 pages, 18 figures

Free-text rationales aim to explain neural language model (LM) behavior more flexibly and intuitively via natural language. To ensure rationale quality, it is important to have metrics for measuring rationales' faithfulness (reflects LM's actual behavior) and plausibility (convincing to humans). All existing free-text rationale metrics are based on simulatability (association between rationale and LM's predicted label), but there is no protocol for assessing such metrics' reliability. To investigate this, we propose FRAME, a framework for evaluating free-text rationale simulatability metrics. FRAME is based on three axioms: (1) good metrics should yield highest scores for reference rationales, which maximize rationale-label association by construction; (2) good metrics should be appropriately sensitive to semantic perturbation of rationales; and (3) good metrics should be robust to variation in the LM's task performance. Across three text classification datasets, we show that existing simulatability metrics cannot satisfy all three FRAME axioms, since they are implemented via model pretraining which muddles the metric's signal. We introduce a non-pretraining simulatability variant that improves performance on (1) and (3) by an average of 41.7% and 42.9%, respectively, while performing competitively on (2).

翻译：自由文本的理由陈述旨在更灵活和直观地解释自然语言的神经语言模型(LM)行为。为了确保理由陈述的质量,重要的是要有衡量理由陈述的忠诚性(反映LM的实际行为)和可信赖性(对人类的可信赖性)的衡量标准。所有现有的自由文本理由陈述都基于可互容性(理由与LM预测标签之间的关联),但是没有评估这类指标可靠性的协议。为了调查这一点,我们建议FRAME,一个评价自由文本理由模拟指标的框架。 FRAME基于三个轴线:(1)好的衡量标准应产生最高分数来衡量理由的忠诚性(反映LM的实际行为)和可信赖性(对人类的可信赖性);(2)好的衡量标准应适当敏感地反映理由的语义扭曲性;(3)好的衡量标准应强于LM任务性的变化性。在三个文本分类数据集中,我们建议FRAMEA,一个评价自由文本理由陈述的可比性指标框架。FRAME As, 三个轴质性指标基于三个轴数。FRAME Aseximom mission 。FRisimomimommilling suprestrain press abrestrain salstrain sal press a press a press press a press silvapressal 4revation silvastrevolviolviolviolviolviolviolviubiltaltaltibiltal 4),因为我们我们采用一种标准,因为我们我们采用一种标准,我们采用一种标准,我们采用一种标准,我们采用一种标准前的衡量标准,我们采用一种标准,我们采用一种标准,而采用一种标准前制制模模模模制的性模型。

0

相关内容

Performer

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

一种无直流储能元件的电能传输控制新技术：相位和幅值可控交-交变换器

国家自然科学基金

0+阅读 · 2014年12月31日

基于共享孔径MTM的天线宽带RCS减缩技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Metasurface的THz慢波器件研究

国家自然科学基金

0+阅读 · 2013年12月31日

非光滑凸优化问题的快速算法及其在图像分析中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于仿生优化算法的含可再生能源机组负荷调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

无界Petri网分析理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

城市再生水利用过程中低剂量污染物的高效去除机制

国家自然科学基金

0+阅读 · 2011年12月31日

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Arxiv

0+阅读 · 2022年8月24日

A New Scheme for Image Compression and Encryption Using ECIES, Henon Map, and AEGAN

Arxiv

0+阅读 · 2022年8月24日

Evaluating Synthetic Bugs

Arxiv

0+阅读 · 2022年8月23日

GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers

Arxiv

0+阅读 · 2022年8月23日

Exploiting auto-encoders and segmentation methods for middle-level explanations of image classification systems

Arxiv

0+阅读 · 2022年8月23日

Few-Shot Table-to-Text Generation with Prefix-Controlled Generator

Arxiv

0+阅读 · 2022年8月23日

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

Arxiv

0+阅读 · 2022年8月21日

Use-Case-Grounded Simulations for Explanation Evaluation

Arxiv

0+阅读 · 2022年8月20日

SimLDA: A tool for topic model evaluation

Arxiv

0+阅读 · 2022年8月19日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

神经语言模型

语言模型化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Arxiv

0+阅读 · 2022年8月24日

A New Scheme for Image Compression and Encryption Using ECIES, Henon Map, and AEGAN

Arxiv

0+阅读 · 2022年8月24日

Evaluating Synthetic Bugs

Arxiv

0+阅读 · 2022年8月23日

GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers

Arxiv

0+阅读 · 2022年8月23日

Exploiting auto-encoders and segmentation methods for middle-level explanations of image classification systems

Arxiv

0+阅读 · 2022年8月23日

Few-Shot Table-to-Text Generation with Prefix-Controlled Generator

Arxiv

0+阅读 · 2022年8月23日

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

Arxiv

0+阅读 · 2022年8月21日

Use-Case-Grounded Simulations for Explanation Evaluation

Arxiv

0+阅读 · 2022年8月20日

SimLDA: A tool for topic model evaluation

Arxiv

0+阅读 · 2022年8月19日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

一种无直流储能元件的电能传输控制新技术：相位和幅值可控交-交变换器

国家自然科学基金

0+阅读 · 2014年12月31日

基于共享孔径MTM的天线宽带RCS减缩技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Metasurface的THz慢波器件研究

国家自然科学基金

0+阅读 · 2013年12月31日

非光滑凸优化问题的快速算法及其在图像分析中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于仿生优化算法的含可再生能源机组负荷调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

无界Petri网分析理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

城市再生水利用过程中低剂量污染物的高效去除机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员