进行稳定的次级目标代表性学习的活跃的等级探索 (Active Hierarchical Exploration with Stable Subgoal Representation Learning) - 专知论文

会员服务 ·

0

学成 · 表示学习 · Continuity · 稀疏 · 分层强化学习 ·

2021 年 10 月 8 日

Active Hierarchical Exploration with Stable Subgoal Representation Learning

翻译：进行稳定的次级目标代表性学习的活跃的等级探索

Siyuan Li,Jin Zhang,Jianhao Wang,Yang Yu,Chongjie Zhang

Goal-conditioned hierarchical reinforcement learning (GCHRL) provides a promising approach to solving long-horizon tasks. Recently, its success has been extended to more general settings by concurrently learning hierarchical policies and subgoal representations. Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level policy learning relies on external rewards. As the high-level policy selects subgoals in an online learned representation space, the dynamic change of the subgoal space severely hinders effective high-level exploration. In this paper, we propose a novel regularization that contributes to both stable and efficient subgoal representation learning. Building upon the stable representation, we design measures of novelty and potential for subgoals, and develop an active hierarchical exploration strategy that seeks out new promising subgoals and states without intrinsic rewards. Experimental results show that our approach significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards.

翻译：以目标为条件的等级强化学习(GCHL)为解决长期横向强化任务提供了很有希望的方法。最近,它的成功通过同时学习等级政策和次级目标表述方式扩展到了更一般的环境。虽然GCHL拥有通过次级目标将任务分解的优越的勘探能力,但现有的GCHL方法在长期延长的任务中挣扎,外部奖励很少,因为高级别政策学习依靠外部奖励。由于高层政策在网上学习的展示空间选择次级目标,次级目标空间的动态变化严重阻碍了高级别的有效探索。在本文件中,我们提出了有助于稳定和高效次级目标代表学习的新规范。我们以稳定的代表性为基础,设计新的措施和次级目标的潜力,并制定一项积极的等级探索战略,寻求新的有希望的次级目标和没有内在奖励的国家。实验结果显示,我们的方法大大超越了持续控制任务中以微量回报为基础的最新基线。

0

相关内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知

8+阅读 · 2019年12月22日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Arxiv

0+阅读 · 2021年12月2日

Hierarchical Prototype Networks for Continual Graph Representation Learning

Arxiv

0+阅读 · 2021年11月30日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Return-Based Contrastive Representation Learning for Reinforcement Learning

Arxiv

10+阅读 · 2021年2月22日

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Arxiv

8+阅读 · 2020年4月13日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Learning Disentangled Representations for Recommendation

Learning Disentangled Representations for Recommendation

Arxiv

8+阅读 · 2019年10月31日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

13+阅读 · 2018年6月26日

VIP会员

文章信息

相关主题

分层强化学习

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知

8+阅读 · 2019年12月22日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Arxiv

0+阅读 · 2021年12月2日

Hierarchical Prototype Networks for Continual Graph Representation Learning

Arxiv

0+阅读 · 2021年11月30日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Return-Based Contrastive Representation Learning for Reinforcement Learning

Arxiv

10+阅读 · 2021年2月22日

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Arxiv

8+阅读 · 2020年4月13日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Learning Disentangled Representations for Recommendation

Learning Disentangled Representations for Recommendation

Arxiv

8+阅读 · 2019年10月31日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

13+阅读 · 2018年6月26日

微信扫码咨询专知VIP会员