Meta-reinforcement learning (meta-RL) aims to quickly solve new tasks by leveraging knowledge from prior tasks. However, previous studies often assume a single mode homogeneous task distribution, ignoring possible structured heterogeneity among tasks. Leveraging such structures can better facilitate knowledge sharing among related tasks and thus improve sample efficiency. In this paper, we explore the structured heterogeneity among tasks via clustering to improve meta-RL. We develop a dedicated exploratory policy to discover task structures via divide-and-conquer. The knowledge of the identified clusters helps to narrow the search space of task-specific information, leading to more sample efficient policy adaptation. Experiments on various MuJoCo tasks showed the proposed method can unravel cluster structures effectively in both rewards and state dynamics, proving strong advantages against a set of state-of-the-art baselines.
翻译:元加强学习(meta-RL)的目的是利用先前任务的知识,迅速解决新任务,然而,以往的研究往往采用单一模式的单一任务分布,忽略了任务之间可能存在的结构性差异性。利用这些结构可以更好地促进相关任务之间的知识共享,从而提高抽样效率。在本文件中,我们探索通过集群改进元任务之间的结构性差异性。我们制定了专门的探索政策,通过分化和征服来发现任务结构。对所查明的集群的了解有助于缩小任务特定信息的搜索空间,从而导致更有效地进行抽样政策调整。对MuJoCo各项任务的实验表明,拟议的方法可以在奖励和状态动态两方面有效地瓦解集群结构,证明相对于一套最先进的基线而言,集群结构具有强大的优势。