改进利用人类反馈加强学习的多模式互动工具 (Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback) - 专知论文

会员服务 ·

0

Agent · INTERACT · Learning · 多峰值 · 强化学习 ·

2022 年 11 月 21 日

Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback

翻译：改进利用人类反馈加强学习的多模式互动工具

Josh Abramson,Arun Ahuja,Federico Carnevale,Petko Georgiev,Alex Goldin,Alden Hung,Jessica Landon,Jirka Lhotka,Timothy Lillicrap,Alistair Muldal,George Powell,Adam Santoro,Guy Scully,Sanjana Srivastava,Tamara von Glehn,Greg Wayne,Nathaniel Wong,Chen Yan,Rui Zhu

An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4.

翻译：人工智能的一个重要目标是创建既能自然地与人类互动又能从其反馈中学习的代理商。我们在这里演示如何利用从人类反馈(RLHF)中强化学习的方法来改进模拟的、经过模拟培训的装饰代理商,使其具备模仿学习的基本能力。首先,我们收集了在模拟的3D世界中与代理商互动的人类数据。然后,我们请通知员记录他们认为代理商要么进步到或从其人的指令目标中退步的时刻。我们利用了这种注解数据,我们利用了一种新颖的方法 — 我们称之为“跨时布拉德-泰里”建模(IBTT) — 来建立一种捕捉人类判断的奖赏模式。被培训的代理商们将IBT奖励模型的奖赏改进了我们所有衡量标准,包括随后在与代理商进行现场互动时的人类判断。我们综合的结果表明,人们如何成功地利用人类判断来改进代理商行为,从而使我们能够在复杂、含意的领域中使用强化学习,而没有方案奖励功能。代理商行为的视频可以在 https://yotube/v9F2_ek4中找到。

1

相关内容

Agent

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

NES1基因联合188Re内放射治疗前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于CEHBP-mTOR-CD4 T 细胞途径增强CD8 T细胞抵抗免疫耗竭的效应及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

角质细胞生长因子在脂肪源间充质干细胞修复放化疗相关胸腺损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

PIM/BCL-xl和NF-κB/cIAPs凋亡通路在经典骨髓增殖性肿瘤凋亡中作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多功能前列腺癌靶向纳米载体设计及其热疗化疗作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eg5在肾癌进展中的作用及靶向抑制Eg5治疗耐药性肾癌的效果和机制

国家自然科学基金

0+阅读 · 2012年12月31日

新型免疫负调控分子TIPE2调控CD4+T细胞的功能及在HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

SG611-PDCD5靶向杀伤慢性髓性白血病细胞的分子机制及应用基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

SIRT1调控转录因子KLF4影响内皮祖细胞分化的作用及机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年1月20日

Deep Reinforcement Learning for Power Trading

Arxiv

1+阅读 · 2023年1月19日

Advanced Scaling Methods for VNF deployment with Reinforcement Learning

Arxiv

0+阅读 · 2023年1月19日

Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems

Arxiv

0+阅读 · 2023年1月19日

A Comprehensive Architecture for Dynamic Role Allocation and Collaborative Task Planning in Mixed Human-Robot Teams

Arxiv

0+阅读 · 2023年1月19日

Towards the design of user-centric strategy recommendation systems for collaborative Human-AI tasks

Arxiv

0+阅读 · 2023年1月17日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年1月20日

Deep Reinforcement Learning for Power Trading

Arxiv

1+阅读 · 2023年1月19日

Advanced Scaling Methods for VNF deployment with Reinforcement Learning

Arxiv

0+阅读 · 2023年1月19日

Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems

Arxiv

0+阅读 · 2023年1月19日

A Comprehensive Architecture for Dynamic Role Allocation and Collaborative Task Planning in Mixed Human-Robot Teams

Arxiv

0+阅读 · 2023年1月19日

Towards the design of user-centric strategy recommendation systems for collaborative Human-AI tasks

Arxiv

0+阅读 · 2023年1月17日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

NES1基因联合188Re内放射治疗前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于CEHBP-mTOR-CD4 T 细胞途径增强CD8 T细胞抵抗免疫耗竭的效应及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

角质细胞生长因子在脂肪源间充质干细胞修复放化疗相关胸腺损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

PIM/BCL-xl和NF-κB/cIAPs凋亡通路在经典骨髓增殖性肿瘤凋亡中作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多功能前列腺癌靶向纳米载体设计及其热疗化疗作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eg5在肾癌进展中的作用及靶向抑制Eg5治疗耐药性肾癌的效果和机制

国家自然科学基金

0+阅读 · 2012年12月31日

新型免疫负调控分子TIPE2调控CD4+T细胞的功能及在HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

SG611-PDCD5靶向杀伤慢性髓性白血病细胞的分子机制及应用基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

SIRT1调控转录因子KLF4影响内皮祖细胞分化的作用及机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员