评价人类-语言模型相互作用 (Evaluating Human-Language Model Interaction) - 专知论文

会员服务 ·

0

INTERACT · MoDELS · 输出 · state-of-the-art · Performer ·

2022 年 12 月 19 日

Evaluating Human-Language Model Interaction

翻译：评价人类-语言模型相互作用

Mina Lee,Megha Srivastava,Amelia Hardy,John Thickstun,Esin Durmus,Ashwin Paranjape,Ines Gerard-Ursin,Xiang Lisa Li,Faisal Ladhak,Frieda Rong,Rose E. Wang,Minae Kwon,Joon Sung Park,Hancheng Cao,Tony Lee,Rishi Bommasani,Michael Bernstein,Percy Liang

Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.

翻译：语言模型(LMS)的许多实际应用,如代码自动化和写作协助,涉及人与LM互动,但主要的LM基准是非互动性的,一个系统可以产生没有人干预的产出。为了评估人与LM互动,我们开发了一个框架,即人类-AI语言互动评价(H-LINE),将非互动性评价扩大到三个层面,捕捉(一)互动进程,而不仅仅是最后产出;(二)第一人主观经验,而不仅仅是第三方评估;以及(三)超越质量的偏好概念。然后我们设计五项任务,从目标导向型到开放型到捕捉不同形式的互动。关于四种最先进的LMS(OpAI GPT-3和AI21 J1-Jumbo的三种变式),我们发现,非互动性业绩并不总是导致更好的人与LM互动,第一人和第三方衡量标准可以不同,表明研究人类与LM互动的细微差别的重要性。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Neuregulin-1/ErbB信号传导系统在缺血性心脏病心肌血管重构中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

MIMP调控miR-21-Ras/MAPK通路抑制具核梭杆菌致肠上皮细胞癌变的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

TGR5受体在高糖刺激下心肌细胞功能调控中的作用及信号转导机制

国家自然科学基金

0+阅读 · 2013年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

NK细胞亚群在小鼠心脏移植慢性排斥中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NKp44+NK细胞类风湿关节炎发病机制及乌梢蛇Ⅱ型胶原蛋白对其干预的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

NT－3修饰的神经干细胞移植治疗脑性瘫痪的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

MEK5-ERK5信号通路调控心肌细胞自体吞噬及其致心肌病机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Arxiv

0+阅读 · 2023年2月17日

"Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction

Arxiv

0+阅读 · 2023年2月16日

Evaluating and Improving the Coreference Capabilities of Machine Translation Models

Arxiv

0+阅读 · 2023年2月16日

Rethinking Semantic Segmentation Evaluation for Explainability and Model Selection

Arxiv

0+阅读 · 2023年2月15日

Augmenting Pathologists with NaviPath: Design and Evaluation of a Human-AI Collaborative Navigation System

Arxiv

0+阅读 · 2023年2月14日

AI for Next Generation Computing: Emerging Trends and Future Directions

Arxiv

19+阅读 · 2022年3月5日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

Arxiv

10+阅读 · 2018年1月19日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Arxiv

0+阅读 · 2023年2月17日

"Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction

Arxiv

0+阅读 · 2023年2月16日

Evaluating and Improving the Coreference Capabilities of Machine Translation Models

Arxiv

0+阅读 · 2023年2月16日

Rethinking Semantic Segmentation Evaluation for Explainability and Model Selection

Arxiv

0+阅读 · 2023年2月15日

Augmenting Pathologists with NaviPath: Design and Evaluation of a Human-AI Collaborative Navigation System

Arxiv

0+阅读 · 2023年2月14日

AI for Next Generation Computing: Emerging Trends and Future Directions

Arxiv

19+阅读 · 2022年3月5日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

Arxiv

10+阅读 · 2018年1月19日

相关基金

Neuregulin-1/ErbB信号传导系统在缺血性心脏病心肌血管重构中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

MIMP调控miR-21-Ras/MAPK通路抑制具核梭杆菌致肠上皮细胞癌变的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

TGR5受体在高糖刺激下心肌细胞功能调控中的作用及信号转导机制

国家自然科学基金

0+阅读 · 2013年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

NK细胞亚群在小鼠心脏移植慢性排斥中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NKp44+NK细胞类风湿关节炎发病机制及乌梢蛇Ⅱ型胶原蛋白对其干预的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

NT－3修饰的神经干细胞移植治疗脑性瘫痪的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

MEK5-ERK5信号通路调控心肌细胞自体吞噬及其致心肌病机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员