通过学习动力-动力-内在内在奖励自动奖励设计 (Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards) - 专知论文

会员服务 ·

0

Learning · Agent · motivation · Performer · Extensibility ·

2022 年 7 月 29 日

Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards

翻译：通过学习动力-动力-内在内在奖励自动奖励设计

Yixiang Wang,Yujing Hu,Feng Wu,Yingfeng Chen

Reward design is a critical part of the application of reinforcement learning, the performance of which strongly depends on how well the reward signal frames the goal of the designer and how well the signal assesses progress in reaching that goal. In many cases, the extrinsic rewards provided by the environment (e.g., win or loss of a game) are very sparse and make it difficult to train agents directly. Researchers usually assist the learning of agents by adding some auxiliary rewards in practice. However, designing auxiliary rewards is often turned to a trial-and-error search for reward settings that produces acceptable results. In this paper, we propose to automatically generate goal-consistent intrinsic rewards for the agent to learn, by maximizing which the expected accumulative extrinsic rewards can be maximized. To this end, we introduce the concept of motivation which captures the underlying goal of maximizing certain rewards and propose the motivation based reward design method. The basic idea is to shape the intrinsic rewards by minimizing the distance between the intrinsic and extrinsic motivations. We conduct extensive experiments and show that our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.

翻译：奖励设计是应用强化学习的一个关键部分,其业绩在很大程度上取决于奖励信号如何很好地设定了设计者的目标,以及信号如何很好地评估了实现这一目标的进展。在许多情况下,环境提供的外部奖励(如游戏的赢或输)非常稀少,难以直接培训代理人。研究人员通常通过在实践中增加一些辅助奖励来帮助代理者学习。然而,设计辅助奖励往往转向试探产生可接受结果的奖励设置。在本文件中,我们提议通过最大限度地实现预期的累积性外部奖励(如游戏的赢或输),自动为代理者创造符合目标的内在奖赏。为此,我们提出动机概念,抓住最大限度地获得某些奖励的基本目标,并提出以奖励为动机的设计方法。基本想法是通过尽量减少内在动机与外部动机之间的距离来塑造内在的奖赏。我们进行广泛的实验,并表明我们的方法比处理延迟勘探、信用分配和问题的国家方法要好。

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

用于治疗2型糖尿病的PPARα/γ双重激动剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

固体的热导率和远红外介电常数的第一原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

新型费米原子气体中相互作用效应的理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

不同氧化剂条件下家具板材释放VOCs的SOA生成转化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

某些偏微分方程解的零点集结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

多层梯度多场耦合纳米复合材料的性能分析及优化设计

国家自然科学基金

0+阅读 · 2011年12月31日

金属材料多轴-高周疲劳加载软性系数研究

国家自然科学基金

0+阅读 · 2011年12月31日

三维线性阱离子囚禁的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

MaxMatch: Semi-Supervised Learning with Worst-Case Consistency

Arxiv

0+阅读 · 2022年9月26日

Persformer: A Transformer Architecture for Topological Machine Learning

Arxiv

0+阅读 · 2022年9月26日

Multi-Agent Exploration of an Unknown Sparse Landmark Complex via Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年9月23日

Unpaired Depth Super-Resolution in the Wild

Arxiv

0+阅读 · 2022年9月23日

From Weakly Supervised Learning to Active Learning

Arxiv

0+阅读 · 2022年9月23日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

MaxMatch: Semi-Supervised Learning with Worst-Case Consistency

Arxiv

0+阅读 · 2022年9月26日

Persformer: A Transformer Architecture for Topological Machine Learning

Arxiv

0+阅读 · 2022年9月26日

Multi-Agent Exploration of an Unknown Sparse Landmark Complex via Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年9月23日

Unpaired Depth Super-Resolution in the Wild

Arxiv

0+阅读 · 2022年9月23日

From Weakly Supervised Learning to Active Learning

Arxiv

0+阅读 · 2022年9月23日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

用于治疗2型糖尿病的PPARα/γ双重激动剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

固体的热导率和远红外介电常数的第一原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

新型费米原子气体中相互作用效应的理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

不同氧化剂条件下家具板材释放VOCs的SOA生成转化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

某些偏微分方程解的零点集结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

多层梯度多场耦合纳米复合材料的性能分析及优化设计

国家自然科学基金

0+阅读 · 2011年12月31日

金属材料多轴-高周疲劳加载软性系数研究

国家自然科学基金

0+阅读 · 2011年12月31日

三维线性阱离子囚禁的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员