Potential-based reward shaping (PBRS) is a particular category of machine learning methods which aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the process of transfer learning: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both the transfer learning and the potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
翻译:潜在的奖励制成(PBRS)是一种特殊的机械学习方法,目的是通过在执行任务时提取和利用额外知识,提高强化学习机构的学习速度,在转让学习过程中有两个步骤:从以前学到的任务中提取知识并将知识转让用于目标任务;在文献中很好地讨论了后一个步骤,为此建议了各种方法,而前者的探索较少;考虑到这一点,所传播的知识类型非常重要,并可能导致相当大的改进。在转让学习和潜在奖励制成的文献中,一个从未讨论过的问题是学习过程本身所收集的知识。在本文件中,我们介绍了一种基于潜在奖励制成方法,试图从学习过程中获取知识。拟议方法从事件累积的回报中提取知识。在Arcade学习环境中对拟议方法进行了评价,结果显示单项任务和多任务强化学习代理人的学习过程都有改进。</s>