We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}. We propose a new notion of task relatedness between source and target tasks, and develop a novel approach for representational transfer under this assumption. Concretely, we show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy in the target task. The sample complexity is close to knowing the ground truth features in the target task, and comparable to prior representation learning results in the source tasks. We complement our positive results with lower bounds without generative access, and validate our findings with empirical evaluation on rich observation MDPs that require deep exploration. In our experiments, we observe a speed up in learning in the target by pre-training, and also validate the need for generative access in source tasks.
翻译:我们研究了RL的代言人转移问题,在RL,一个代理商在一些来源任务中先是发现共同代表,然后被用来学习在\emph{目标任务中的良好政策。我们提出了源和目标任务之间任务关联性的新概念,并在这一假设下为代言人转移制定了新的方法。具体地说,我们表明,在获得来源任务的基因化准入方面,我们可以找到一种代言人,随后的线性RL技术很快地集中到目标任务中的接近最佳的政策中。抽样复杂性接近于了解目标任务中的地面真相特征,与先前在源任务中进行代言学习的结果相似。我们以较低的范围补充我们的积极结果,而没有基因化的接入,并通过对需要深入探索的丰富观测MDP的经验性评估来验证我们的调查结果。我们实验中看到,在目标中,通过培训前的学习速度加快,并验证了在源任务中进行代言访问的必要性。