Transfer learning can speed up training in machine learning and is regularly used in classification tasks. It reuses prior knowledge from other tasks to pre-train networks for new tasks. In reinforcement learning, learning actions for a behavior policy that can be applied to new environments is still a challenge, especially for tasks that involve much planning. Sokoban is a challenging puzzle game. It has been used widely as a benchmark in planning-based reinforcement learning. In this paper, we show how prior knowledge improves learning in Sokoban tasks. We find that reusing feature representations learned previously can accelerate learning new, more complex, instances. In effect, we show how curriculum learning, from simple to complex tasks, works in Sokoban. Furthermore, feature representations learned in simpler instances are more general, and thus lead to positive transfers towards more complex tasks, but not vice versa. We have also studied which part of the knowledge is most important for transfer to succeed, and identify which layers should be used for pre-training.
翻译:转移学习可以加快机器学习培训,并经常用于分类任务。它将先前从其他任务获得的知识重新用于培训前网络,用于新的任务。在强化学习中,学习可以适用于新环境的行为政策仍然是一项挑战,特别是对于涉及大量规划的任务来说,这仍然是一项挑战性游戏。 Sokoban是一个具有挑战性的游戏。它被广泛用作基于规划的强化学习的基准。在本文中,我们展示了先前的知识如何改善Sokoban任务中的学习。我们发现,重新使用以前学到的特征表现可以加速学习新的、更复杂的事例。实际上,我们展示了从简单到复杂的任务,在Sokoban学习课程的过程是如何比较普通的,从而导致向更复杂的任务的积极转变,而不是反之亦然。我们还研究了知识中哪些部分对于成功转让最为重要,并确定了哪些层次应用于培训前阶段。