Efficient and robust policy transfer remains a key challenge for reinforcement learning to become viable for real-wold robotics. Policy transfer through warm initialization, imitation, or interacting over a large set of agents with randomized instances, have been commonly applied to solve a variety of Reinforcement Learning tasks. However, this seems far from how skill transfer happens in the biological world: Humans and animals are able to quickly adapt the learned behaviors between similar tasks and learn new skills when presented with new situations. Here we seek to answer the question: Will learning to combine adaptation and exploration lead to a more efficient transfer of policies between domains? We introduce a principled mechanism that can "Adapt-to-Learn", that is adapt the source policy to learn to solve a target task with significant transition differences and uncertainties. We show that the presented method learns to seamlessly combine learning from adaptation and exploration and leads to a robust policy transfer algorithm with significantly reduced sample complexity in transferring skills between related tasks.
翻译:高效和稳健的政策转移对于强化学习对于实实在在的机器人来说仍是一个关键的挑战。 通过温暖的初始化、模仿或随机操作的一大批代理物进行的政策转移通常被用于解决各种强化学习任务。然而,这似乎与生物世界如何进行技能转移相去甚远:人类和动物能够快速适应类似任务之间的学习行为,并在出现新情况时学习新的技能。我们在这里试图回答这样一个问题:学习适应和探索相结合,将导致政策在领域之间更高效地转移?我们引入了一种“适应到学习”的原则性机制,即调整源政策,以学会解决具有重大过渡差异和不确定性的目标任务。我们表明,所提出的方法学会了将适应和探索的学习紧密结合起来,并导致一种强有力的政策转移算法,在相关任务之间的技能转移中,抽样复杂性大大降低。