Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control. However, it poses a challenging multi-task reinforcement learning problem, as the optimal policy may be quite different across robots and critically depend on the morphology. Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies, but pay little attention to the dependency of a robot's control policy on its morphology context. In this paper, we propose a hierarchical architecture to better model this dependency via contextual modulation, which includes two key submodules: (1) Instead of enforcing hard parameter sharing across robots, we use hypernetworks to generate morphology-dependent control parameters; (2) We propose a morphology-dependent attention mechanism to modulate the interactions between different limbs in a robot. Experimental results show that our method not only improves learning performance on a diverse set of training robots, but also generalizes better to unseen morphologies in a zero-shot fashion.
翻译:跨不同机器人形态的通用政策可以大大提高学习效率和持续控制的通用政策。 但是,它带来了一个具有挑战性的多任务强化学习问题,因为最佳政策在机器人之间可能差异很大,而且关键取决于形态学。 现有方法使用图形神经网络或变压器处理不同形态的不同状态和行动空间,但很少注意机器人控制政策对其形态背景的依赖性。 在本文中,我们提议了一个等级结构,通过背景调节更好地模拟这种依赖性,其中包括两个关键子模块:(1) 我们使用超网络来生成依赖形态的控制参数,而不是在机器人之间强制实施硬参数共享;(2) 我们提议一种依赖形态的注意机制,以调节机器人不同肢体之间的相互作用。实验结果表明,我们的方法不仅改善了多种培训机器人的学习表现,而且以零光照的方式将非形形形形形色化得更好。