一套用于逐步逐步计算理由的计量器 (ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning) - 专知论文

会员服务 ·

0

得分 · 基准 · 相互独立的 · MoDELS · Performer ·

2022 年 12 月 15 日

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

翻译：一套用于逐步逐步计算理由的计量器

Olga Golovneva,Moya Chen,Spencer Poff,Martin Corredor,Luke Zettlemoyer,Maryam Fazel-Zarandi,Asli Celikyilmaz

Large language models show improved downstream task performance when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness (independent of the final answer) is difficult without reliable methods for automatic evaluation. We simply do not know how often the stated reasoning steps actually support the final end task predictions. In this work, we present ROSCOE, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics. To evaluate ROSCOE against baseline metrics, we design a typology of reasoning errors and collect synthetic and human evaluation scores on commonly used reasoning datasets. In contrast with existing metrics, ROSCOE can measure semantic consistency, logicality, informativeness, fluency, and factuality - among other traits - by leveraging properties of step-by-step rationales. We empirically verify the strength of our metrics on five human annotated and six programmatically perturbed diagnostics datasets - covering a diverse set of tasks that require reasoning skills and show that ROSCOE can consistently outperform baseline metrics.

翻译：大型语言模型显示,在促成逐步推理以证明其最终答案时,下游任务业绩有所改善。这些推理步骤极大地改进了模型的解释和核查,但如果没有可靠的自动评价方法,很难客观地研究其正确性(独立于最终答案)。我们只是不知道所说明的推理步骤如何经常地实际支持最终任务预测。在这项工作中,我们介绍了一套可解释的、不受监督的自动评分,改进并扩展了先前的文本生成评价指标。为了对照基线指标评价ROSCOE,我们设计了一种推理错误分类,并收集了常用推理数据集的合成和人评价分数。与现有的衡量标准不同,ROSCOE能够通过利用逐步推理原理的特性衡量语义一致性、逻辑性、信息性、流畅通性和事实质量等特征。我们用经验来核查我们关于5个人的附加说明和6个方案性过敏的诊断数据集的衡量尺度的强度,涵盖各种需要推理技能的任务,并表明ROSCOE能够持续地超越基线指标。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

节理岩体摩擦律的动力学机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MICROMEGAS探测器用于低剂量X射线成像的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

以天然药物- - 没食子酸为基元构筑可降解树枝状大分子多功能载体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

准一致熔PMN-PT基赝三元系高Trt弛豫铁电材料的MPB组分设计与强制对流条件下的晶体生长

国家自然科学基金

0+阅读 · 2009年12月31日

用于兰州HIRFL－CSR内外靶实验飞行时间探测器的多气隙电阻板室研制

国家自然科学基金

0+阅读 · 2009年12月31日

共价嫁接铂卟啉配合物介孔分子筛的制备及氧传感性能

国家自然科学基金

0+阅读 · 2008年12月31日

Counterfactual Reasoning for Bias Evaluation and Detection in a Fairness under Unawareness setting

Arxiv

0+阅读 · 2023年2月16日

Counterfactual Fair Opportunity: Measuring Decision Model Fairness with Counterfactual Reasoning

Arxiv

0+阅读 · 2023年2月16日

A Categorical Normalization Proof for the Modal Lambda-Calculus

Arxiv

0+阅读 · 2023年2月15日

Commonsense Reasoning for Conversational AI: A Survey of the State of the Art

Arxiv

0+阅读 · 2023年2月15日

The Capacity for Moral Self-Correction in Large Language Models

Arxiv

0+阅读 · 2023年2月15日

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

Arxiv

0+阅读 · 2023年2月14日

A Comprehensive Study of Real-Time Object Detection Networks Across Multiple Domains: A Survey

Arxiv

0+阅读 · 2023年2月14日

Lifelong Learning Metrics

Lifelong Learning Metrics

Arxiv

48+阅读 · 2022年1月20日

CSKG: The CommonSense Knowledge Graph

CSKG: The CommonSense Knowledge Graph

Arxiv

18+阅读 · 2020年12月21日

Reasoning on Knowledge Graphs with Debate Dynamics

Reasoning on Knowledge Graphs with Debate Dynamics

Arxiv

14+阅读 · 2020年1月2日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Counterfactual Reasoning for Bias Evaluation and Detection in a Fairness under Unawareness setting

Arxiv

0+阅读 · 2023年2月16日

Counterfactual Fair Opportunity: Measuring Decision Model Fairness with Counterfactual Reasoning

Arxiv

0+阅读 · 2023年2月16日

A Categorical Normalization Proof for the Modal Lambda-Calculus

Arxiv

0+阅读 · 2023年2月15日

Commonsense Reasoning for Conversational AI: A Survey of the State of the Art

Arxiv

0+阅读 · 2023年2月15日

The Capacity for Moral Self-Correction in Large Language Models

Arxiv

0+阅读 · 2023年2月15日

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

Arxiv

0+阅读 · 2023年2月14日

A Comprehensive Study of Real-Time Object Detection Networks Across Multiple Domains: A Survey

Arxiv

0+阅读 · 2023年2月14日

Lifelong Learning Metrics

Lifelong Learning Metrics

Arxiv

48+阅读 · 2022年1月20日

CSKG: The CommonSense Knowledge Graph

CSKG: The CommonSense Knowledge Graph

Arxiv

18+阅读 · 2020年12月21日

Reasoning on Knowledge Graphs with Debate Dynamics

Reasoning on Knowledge Graphs with Debate Dynamics

Arxiv

14+阅读 · 2020年1月2日

相关基金

节理岩体摩擦律的动力学机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MICROMEGAS探测器用于低剂量X射线成像的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

以天然药物- - 没食子酸为基元构筑可降解树枝状大分子多功能载体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

准一致熔PMN-PT基赝三元系高Trt弛豫铁电材料的MPB组分设计与强制对流条件下的晶体生长

国家自然科学基金

0+阅读 · 2009年12月31日

用于兰州HIRFL－CSR内外靶实验飞行时间探测器的多气隙电阻板室研制

国家自然科学基金

0+阅读 · 2009年12月31日

共价嫁接铂卟啉配合物介孔分子筛的制备及氧传感性能

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员