Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.
翻译:多任务RL和元RL旨在通过对相关任务的分配进行概括化,提高样本效率。然而,在实践中很难做到:在多任务RL中,先进方法往往不能超越一个只是分别学习每项任务的退化解决方案。超网络是一个有希望的前进道路,因为它们复制了退化解决方案的单独政策,同时允许对各项任务进行概括化,并且适用于元RL。然而,监督学习的证据表明超网络性能对初始化非常敏感。在本文件中,我们1)表明超网络初始化也是元RL的一个关键因素,而天真的初始化也造成不良的性能;2)提出一个新的超网络初始化计划,它与为监督环境提出的最先进方法的性能相匹配或超过,并且更加简单和笼统;3)使用这一方法表明超网络可以通过评估多种模拟机器人基准来改进元RL的性能。