Accelerating learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low. This work proposes REPresentation And INstance Transfer (REPAINT) algorithm for knowledge transfer in deep reinforcement learning. REPAINT not only transfers the representation of a pre-trained teacher policy in the on-policy learning, but also uses an advantage-based experience selection approach to transfer useful samples collected following the teacher policy in the off-policy learning. Our experimental results on several benchmark tasks show that REPAINT significantly reduces the total training time in generic cases of task similarity. In particular, when the source tasks are dissimilar to, or sub-tasks of, the target tasks, REPAINT outperforms other baselines in both training-time reduction and asymptotic performance of return scores.
翻译:通过利用以往学到的任务加快复杂任务的学习过程一直是加强学习方面最棘手的问题之一,特别是在源和目标任务之间相似性低的情况下,这项工作提议在深层强化学习中为知识转让采用说明和知识转让算法(REPAINT),在政策学习中不仅将受过培训的教师政策纳入政策学习,而且利用基于优势的经验选择方法转让根据教师政策收集的非政策学习的有用样本。我们在几项基准任务方面的实验结果表明,REEPINT大大缩短了任务相似性一般情况下的总培训时间,特别是当源任务与目标任务不同或次级任务不同时,REPINT在减少培训时间和不注意执行返回分数方面超越了其他基线。