We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the benefits of the method in Lifelong RL experiments.
翻译:当代理人面临一系列强化学习任务时,我们考虑知识转让问题。我们引入了Markov决策进程之间的新指标,确定近距离的MDP具有近乎最佳的价值功能。形式上,最佳价值功能是任务空间的Lipschitz连续。这些理论结果引导我们为终身RL找到一种价值转移方法,我们用这个方法来构建一个PAC-MDP算法,提高趋同率。我们展示了终生RL实验方法的好处。