持续控制任务与路径规划 (Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning)

We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks and generalizes to unseen test scenarios. Functional decomposition between planning and low-level control is achieved by explicitly separating the state-action spaces across the hierarchy, which allows the integration of task-relevant knowledge per layer. We propose an RL-based planner to efficiently leverage the information in the planning layer of the hierarchy, while the control layer learns a goal-conditioned control policy. The hierarchy is trained jointly but allows for the modular transfer of policy layers across hierarchies of different agents. We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods. We evaluate on complex continuous control tasks with sparse rewards, including navigation and robot manipulation.

翻译：我们提出了一个新型的等级强化学习架构HiDe,它成功地解决了长视范围控制任务,并概括了隐蔽的测试情景。规划和低级别控制之间的功能分解是通过在等级之间明确区分州际行动空间来实现的,从而可以将任务相关知识纳入每一层。我们提议了一个基于RL的计划员,以便在等级的规划层中有效地利用信息,而控制层则学习一个有目标限制的控制政策。该等级体系是联合培训的,但允许在不同代理人的等级结构中以模块形式转移政策层。我们实验性地表明,我们的方法在看不见的测试环境中是普遍的,与基于学习和非学习的方法相比,可以达到3x视距长度。我们评估复杂的连续控制任务,其回报是稀少的,包括导航和机器人操纵。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日