The purpose of multi-task reinforcement learning (MTRL) is to train a single policy that can be applied to a set of different tasks. Sharing parameters allows us to take advantage of the similarities among tasks. However, the gaps between contents and difficulties of different tasks bring us challenges on both which tasks should share the parameters and what parameters should be shared, as well as the optimization challenges due to parameter sharing. In this work, we introduce a parameter-compositional approach (PaCo) as an attempt to address these challenges. In this framework, a policy subspace represented by a set of parameters is learned. Policies for all the single tasks lie in this subspace and can be composed by interpolating with the learned set. It allows not only flexible parameter sharing but also a natural way to improve training. We demonstrate the state-of-the-art performance on Meta-World benchmarks, verifying the effectiveness of the proposed approach.
翻译:多任务加固学习(MTRL)的目的是培训一种单一的政策,可以适用于一系列不同的任务。共享参数使我们能够利用任务之间的相似性。然而,不同任务的内容和困难之间的差距给我们带来了挑战,既涉及哪些任务应当共享参数和哪些参数应当共享,也涉及因共享参数而产生的优化挑战。在这项工作中,我们引入了参数组合方法(PaCo),以试图应对这些挑战。在这个框架中,学习了一套参数所代表的政策分空间。所有单项任务的政策都存在于这一子空间中,可以通过与所学到的一组相交而成。它不仅允许灵活地共享参数,而且还可以自然地改进培训。我们展示了Meta-World基准的最新绩效,核查了拟议方法的有效性。