We focus on learning composable policies to control a variety of physical agents with possibly different structures. Among state-of-the-art methods, prominent approaches exploit graph-based representations and weight-sharing modular policies based on the message-passing framework. However, as shown by recent literature, message passing can create bottlenecks in information propagation and hinder global coordination. This drawback can become even more problematic in tasks where high-level planning is crucial. In fact, in similar scenarios, each modular policy - e.g., controlling a joint of a robot - would request to coordinate not only for basic locomotion but also achieve high-level goals, such as navigating a maze. A classical solution to avoid similar pitfalls is to resort to hierarchical decision-making. In this work, we adopt the Feudal Reinforcement Learning paradigm to develop agents where control actions are the outcome of a hierarchical (pyramidal) message-passing process. In the proposed Feudal Graph Reinforcement Learning (FGRL) framework, high-level decisions at the top level of the hierarchy are propagated through a layered graph representing a hierarchy of policies. Lower layers mimic the morphology of the physical system and upper layers can capture more abstract sub-modules. The purpose of this preliminary work is to formalize the framework and provide proof-of-concept experiments on benchmark environments (MuJoCo locomotion tasks). Empirical evaluation shows promising results on both standard benchmarks and zero-shot transfer learning settings.
翻译:我们专注于学习可组合的策略来控制各种不同结构的物理智能体。在最先进的方法中,突出的方法利用基于图形表示的、基于消息传递框架的权重共享模块化策略。然而,最近的文献表明,消息传递可能会在信息传播中创建瓶颈,并阻碍全局协调。在高级规划至关重要的任务中,这种缺点可能会更加棘手。事实上,在类似的场景中,每个模块化策略 - 如控制机器人的关节 - 都需要协调基本的运动以及实现高级目标,例如通过迷宫。避免类似陷阱的一个经典解决方案是采用分层决策制定。在本文中,我们采用封建强化学习范例,开发代理商,其中控制动作是一个分层(金字塔形)消息传递过程的结果。在所提出的封建图强化学习(FGRL)框架中,分层图通过一层层的策略层次结构来传播高层决策。较低层模仿物理系统的形态,而上层可以捕捉更抽象的子模块。这项初步工作的目的是规范框架并在基准环境(MuJoCo 机器人)上提供概念证明实验。实证评估显示,在标准基准和零-shot转移学习设置方面都具有良好的结果。