内容增强强化强化学习 (Entropy Augmented Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · Agent · Extensibility · Markov · 优化器 ·

2022 年 8 月 19 日

Entropy Augmented Reinforcement Learning

翻译：内容增强强化强化学习

from arxiv, 18 pages, 8 figures

Deep reinforcement learning has gained a lot of success with the presence of trust region policy optimization (TRPO) and proximal policy optimization (PPO), for their scalability and efficiency. However, the pessimism of both algorithms, among which it either is constrained in a trust region or strictly excludes all suspicious gradients, has been proven to suppress the exploration and harm the performance of the agent. To address those issues, we propose a shifted Markov decision process (MDP), or rather, with entropy augmentation, to encourage the exploration and reinforce the ability of escaping from suboptimums. Our method is extensible and adapts to either reward shaping or bootstrapping. With convergence analysis given, we find it is crucial to control the temperature coefficient. However, if appropriately tuning it, we can achieve remarkable performance, even on other algorithms, since it is simple yet effective. Our experiments test augmented TRPO and PPO on MuJoCo benchmark tasks, of an indication that the agent is heartened towards higher reward regions, and enjoys a balance between exploration and exploitation. We verify the exploration bonus of our method on two grid world environments.

翻译：深入强化学习因信任区域政策优化(TRPO)和准政策优化(PPO)的可伸缩性和效率而取得了许多成功,然而,这两种算法的悲观主义,无论是在信任区域受到限制,还是严格排除所有可疑梯度,都证明抑制了对代理人的探索和损害。为了解决这些问题,我们提议采用一个改变的Markov决策程序(MDP),或者使用增试器,鼓励勘探和加强从次优区域逃脱的能力。我们的方法是可以推广的,并且适应于对制导或制导靴的奖励。根据对趋同的分析,我们发现控制温度系数至关重要。但是,如果适当调整它,我们就可以取得显著的成绩,即使是在其他算法上,因为它既简单又有效。我们的实验试验加强了MuJoCo基准任务上的TRPO和PPO,表明代理人正在向更高的奖励区域感到振奋,并保持勘探与开发之间的平衡。我们核查了我们方法在两个电网环境上的勘探红利。

0

相关内容

Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

碘化铯闪烁晶体的余辉形成机理及其抑制

国家自然科学基金

0+阅读 · 2012年12月31日

核酸适配体修饰的磁性介孔氧化硅复合纳米材料的制备及在非水溶性抗癌药物靶向输运中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

新型Re(I)配合物磷光材料的设计、合成及其光电性能研究

国家自然科学基金

1+阅读 · 2012年12月31日

靶向EGFR的新型磁共振成像对比剂实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

水溶性REF3 (RE = Y,Gd)-KF体系上转换发光纳米材料合成及其生物应用

国家自然科学基金

0+阅读 · 2012年12月31日

热中子探测用Ce:Li6Lu(BO3)3晶体的生长、发光机理和闪烁性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型闪烁晶体LuBO3:Ce的相变、生长与闪烁性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

新基因ZIP的转录调控功能及其在肿瘤发生中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Hierarchical Adversarial Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Arxiv

0+阅读 · 2022年10月4日

CaiRL: A High-Performance Reinforcement Learning Environment Toolkit

Arxiv

0+阅读 · 2022年10月3日

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月3日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Arxiv

0+阅读 · 2022年9月29日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Hierarchical Adversarial Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Arxiv

0+阅读 · 2022年10月4日

CaiRL: A High-Performance Reinforcement Learning Environment Toolkit

Arxiv

0+阅读 · 2022年10月3日

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月3日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Arxiv

0+阅读 · 2022年9月29日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

碘化铯闪烁晶体的余辉形成机理及其抑制

国家自然科学基金

0+阅读 · 2012年12月31日

核酸适配体修饰的磁性介孔氧化硅复合纳米材料的制备及在非水溶性抗癌药物靶向输运中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

新型Re(I)配合物磷光材料的设计、合成及其光电性能研究

国家自然科学基金

1+阅读 · 2012年12月31日

靶向EGFR的新型磁共振成像对比剂实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

水溶性REF3 (RE = Y,Gd)-KF体系上转换发光纳米材料合成及其生物应用

国家自然科学基金

0+阅读 · 2012年12月31日

热中子探测用Ce:Li6Lu(BO3)3晶体的生长、发光机理和闪烁性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型闪烁晶体LuBO3:Ce的相变、生长与闪烁性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

新基因ZIP的转录调控功能及其在肿瘤发生中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员