Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the network's parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.
翻译:强化学习(RL)是众所周知的,因为它需要大量的数据,以便RL代理商能够学习执行复杂的任务。基于模型的RL最近的进展使得代理商能够更高效地掌握数据,因为它能够通过利用一个内部的世界环境模型,在想象中学习视觉环境的行为。也可以通过利用以前学到的任务的知识来提高抽样效率,但在RL中转让学习仍然是一个具有挑战性的专题。基于参数的转移学习通常使用全无或全无方法完成,即网络的参数要么完全转移,要么随机初始化。在这项工作中,我们提出了一个简单的替代方法:分数转移学习。想法是转让知识的一小部分,而不是像随机初始化通常所做的那样抛弃潜在有用的知识。我们使用基于世界模型的Dreamer算法,确定这一方法适用于哪类组成部分,并在新的多源转移学习环境中进行实验。结果显示,与从抓起和随机初始化学习相比,分数转移学习往往导致大幅改进业绩和更快的学习。