强化学习的生成反向探索 (Generative Adversarial Exploration for Reinforcement Learning) - 专知论文

会员服务 ·

0

判别器 · 学成 · Less · 强化学习 · 可辨认的 ·

2022 年 1 月 27 日

Generative Adversarial Exploration for Reinforcement Learning

翻译：强化学习的生成反向探索

Weijun Hong,Menghui Zhu,Minghuan Liu,Weinan Zhang,Ming Zhou,Yong Yu,Peng Sun

Exploration is crucial for training the optimal reinforcement learning (RL) policy, where the key is to discriminate whether a state visiting is novel. Most previous work focuses on designing heuristic rules or distance metrics to check whether a state is novel without considering such a discrimination process that can be learned. In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in RL via introducing an intrinsic reward output from a generative adversarial network, where the generator provides fake samples of states that help discriminator identify those less frequently visited states. Thus the agent is encouraged to visit those states which the discriminator is less confident to judge as visited. GAEX is easy to implement and of high training efficiency. In our experiments, we apply GAEX into DQN and the DQN-GAEX algorithm achieves convincing performance on challenging exploration problems, including the game Venture, Montezuma's Revenge and Super Mario Bros, without further fine-tuning on complicate learning algorithms. To our knowledge, this is the first work to employ GAN in RL exploration problems.

翻译：对于培训最佳强化学习(RL)政策来说,探索对于培训最佳强化学习(RL)政策至关重要,关键在于区分国家访问是否是新奇的。以前的大部分工作都侧重于设计休养规则或远程测量标准,以检查国家是否新奇,而不考虑可以学习的这种歧视过程。在本文中,我们提出一种叫作基因式对抗性探索(GAEX)的新颖方法,鼓励在RL进行探索,方法是引入一个基因化对立网络的内在奖赏产出,在网络中,发电机提供有助于歧视者识别受访次数较少的国家的假样品。因此,鼓励代理人访问那些歧视者不太敢于判断的州。GAEX很容易执行,培训效率也很高。在我们的实验中,我们将GAEX应用到DQN和DQN-GAEX算法在挑战性勘探问题上取得了令人信服的表现,包括游戏风险、Montezuma的Revenge和超级Mario Bros,而没有进一步微调复杂的学习算法。据我们所知,这是在RL勘探问题中使用GAN的首项工作。

0

相关内容

判别器

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

在线个性化定制系统的设计特性对消费者定制行为的影响研究

国家自然科学基金

1+阅读 · 2013年12月31日

物理辅助网络系统中场景感知的信息传输机制研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于扰动波速散射理论的高分辨率SAR回波信号建模和应用

国家自然科学基金

0+阅读 · 2013年12月31日

社交网络中软安全机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于分子系综的非经典光源的研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车复杂约束下的多目标集成控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于模糊定性强化学习的复杂不确定系统的模糊协调控制机理研究

国家自然科学基金

3+阅读 · 2009年12月31日

多智能体网络系统的一致性协调控制

国家自然科学基金

3+阅读 · 2009年12月31日

《物理》期刊

国家自然科学基金

1+阅读 · 2009年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Arxiv

1+阅读 · 2022年4月20日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

Efficient Reinforcement Learning for Unsupervised Controlled Text Generation

Arxiv

0+阅读 · 2022年4月16日

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Arxiv

1+阅读 · 2022年4月15日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Arxiv

1+阅读 · 2022年4月20日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

Efficient Reinforcement Learning for Unsupervised Controlled Text Generation

Arxiv

0+阅读 · 2022年4月16日

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Arxiv

1+阅读 · 2022年4月15日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

相关基金

在线个性化定制系统的设计特性对消费者定制行为的影响研究

国家自然科学基金

1+阅读 · 2013年12月31日

物理辅助网络系统中场景感知的信息传输机制研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于扰动波速散射理论的高分辨率SAR回波信号建模和应用

国家自然科学基金

0+阅读 · 2013年12月31日

社交网络中软安全机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于分子系综的非经典光源的研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车复杂约束下的多目标集成控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于模糊定性强化学习的复杂不确定系统的模糊协调控制机理研究

国家自然科学基金

3+阅读 · 2009年12月31日

多智能体网络系统的一致性协调控制

国家自然科学基金

3+阅读 · 2009年12月31日

《物理》期刊

国家自然科学基金

1+阅读 · 2009年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员