Transferring knowledge among various environments is important to efficiently learn multiple tasks online. Most existing methods directly use the previously learned models or previously learned optimal policies to learn new tasks. However, these methods may be inefficient when the underlying models or optimal policies are substantially different across tasks. In this paper, we propose Template Learning (TempLe), the first PAC-MDP method for multi-task reinforcement learning that could be applied to tasks with varying state/action space. TempLe generates transition dynamics templates, abstractions of the transition dynamics across tasks, to gain sample efficiency by extracting similarities between tasks even when their underlying models or optimal policies have limited commonalities. We present two algorithms for an "online" and a "finite-model" setting respectively. We prove that our proposed TempLe algorithms achieve much lower sample complexity than single-task learners or state-of-the-art multi-task methods. We show via systematically designed experiments that our TempLe method universally outperforms the state-of-the-art multi-task methods (PAC-MDP or not) in various settings and regimes.
翻译:在不同环境之间转让知识对于高效率地在网上学习多重任务非常重要。 大多数现有方法直接使用以前学习过的模式或以前学习过的最佳政策来学习新任务。 但是,当基础模式或最佳政策在不同任务之间差别很大时,这些方法可能效率低下。 在本文件中,我们提出样板学习(Tempele),这是可用于多任务强化学习的首个PAC-MDP方法,可以适用于不同状态/行动空间的任务。Temple生成过渡动态模板、跨任务过渡动态的抽取,以便通过在任务之间提取相似之处(即使其基本模式或最佳政策具有有限的共性)来获取样本效率。我们为“在线”和“固定模型”设置分别提出两种算法。我们证明我们拟议的Temle算法比单一任务学习者或最先进的多任务方法的样本复杂性要低得多。 我们通过系统设计的实验显示,我们的Temle方法在各种环境和制度中普遍超越了最先进的多任务方法(PAC-MDP 或非)。