DAS: 通过尽量扩大偏离所勘探区域的情况进行探索 (MADE: Exploration via Maximizing Deviation from Explored Regions) - 专知论文

会员服务 ·

0

上置信界限 · state-of-the-art · 正则化项 · Pair · 置信度 ·

2021 年 6 月 18 日

MADE: Exploration via Maximizing Deviation from Explored Regions

翻译：DAS: 通过尽量扩大偏离所勘探区域的情况进行探索

Tianjun Zhang,Paria Rashidinejad,Jiantao Jiao,Yuandong Tian,Joseph Gonzalez,Stuart Russell

from arxiv, 28 pages, 10 figures

In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via \textit{maximizing} the deviation of the occupancy of the next policy from the explored regions. We add this term as an adaptive regularizer to the standard RL objective to balance exploration vs. exploitation. We pair the new objective with a provably convergent algorithm, giving rise to a new intrinsic reward that adjusts existing bonuses. The proposed intrinsic reward is easy to implement and combine with other existing RL algorithms to conduct exploration. As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from MiniGrid and DeepMind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods. Our code is available at https://github.com/tianjunz/MADE.

翻译：在在线强化学习(RL)中,高效的探索在高维环境中仍然特别具有挑战性,且回报微弱。在低维环境中,有可能采用表格参数化,基于计数的上层信任(UCB)勘探方法可以达到最优化的最小质量。然而,如何在现实现实的RL任务中高效地执行UCB, 涉及非线性功能近似值。为了解决这个问题,我们建议采用一种新的探索方法,通过\ textit{mxxximizing} 来证明下一个政策与所探索区域不同。我们在标准RL目标中添加了这一术语,作为适应性的常规化调节器,以平衡勘探与开发之间的平衡。我们在将新目标配对成一种可察觉的趋同式算法,从而产生一种新的内在奖励,从而调整现有的奖金。提议的内在奖励很容易执行并与现有的其他RL算法相结合进行探索。作为概念的证明,我们评估各种基于模型和无模式的算法的列表范例的新内在奖赏,显示对仅计数的勘探战略的改进。我们在MiniGrid/DGMrMrgMrgnal控制方法上大大改进了我们的现有标准。

0

相关内容

上置信界限

上置信界限

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ACL2020】贝叶斯层次词表示学习，Boosting algorithms in energy research: A systematic review

【ACL2020】贝叶斯层次词表示学习，Boosting algorithms in energy research: A systematic review

专知会员服务

13+阅读 · 2020年4月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CIKM2019 Tutorial】Realtime object detection via deep learning-based pipelines(通过基于深度学习的管道实现实时对象检测)，附教程PDF免费下载

【CIKM2019 Tutorial】Realtime object detection via deep learning-based pipelines(通过基于深度学习的管道实现实时对象检测)，附教程PDF免费下载

专知会员服务

19+阅读 · 2019年11月3日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Exploring Data Aggregation and Transformations to Generalize across Visual Domains

Arxiv

0+阅读 · 2021年8月20日

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Arxiv

0+阅读 · 2021年8月19日

Corruption-robust exploration in episodic reinforcement learning

Arxiv

0+阅读 · 2021年8月18日

Graph Neural Networks Inspired by Classical Iterative Algorithms

Graph Neural Networks Inspired by Classical Iterative Algorithms

Arxiv

4+阅读 · 2021年3月10日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Imbalance Problems in Object Detection: A Review

Arxiv

24+阅读 · 2020年3月11日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Arxiv

4+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

上置信界限

state-of-the-art

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ACL2020】贝叶斯层次词表示学习，Boosting algorithms in energy research: A systematic review

【ACL2020】贝叶斯层次词表示学习，Boosting algorithms in energy research: A systematic review

专知会员服务

13+阅读 · 2020年4月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CIKM2019 Tutorial】Realtime object detection via deep learning-based pipelines(通过基于深度学习的管道实现实时对象检测)，附教程PDF免费下载

【CIKM2019 Tutorial】Realtime object detection via deep learning-based pipelines(通过基于深度学习的管道实现实时对象检测)，附教程PDF免费下载

专知会员服务

19+阅读 · 2019年11月3日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Exploring Data Aggregation and Transformations to Generalize across Visual Domains

Arxiv

0+阅读 · 2021年8月20日

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Arxiv

0+阅读 · 2021年8月19日

Corruption-robust exploration in episodic reinforcement learning

Arxiv

0+阅读 · 2021年8月18日

Graph Neural Networks Inspired by Classical Iterative Algorithms

Graph Neural Networks Inspired by Classical Iterative Algorithms

Arxiv

4+阅读 · 2021年3月10日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Imbalance Problems in Object Detection: A Review

Arxiv

24+阅读 · 2020年3月11日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Arxiv

4+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员