Learning effective representations in image-based environments is crucial for sample efficient Reinforcement Learning (RL). Unfortunately, in RL, representation learning is confounded with the exploratory experience of the agent -- learning a useful representation requires diverse data, while effective exploration is only possible with coherent representations. Furthermore, we would like to learn representations that not only generalize across tasks but also accelerate downstream exploration for efficient task-specific training. To address these challenges we propose Proto-RL, a self-supervised framework that ties representation learning with exploration through prototypical representations. These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations. We pre-train these task-agnostic representations and prototypes on environments without downstream task information. This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
翻译:在基于图像的环境中有效学习对于抽样高效强化学习(RL)至关重要。 不幸的是,在RL中,代表性学习与代理的探索经验混为一谈 -- -- 学习有用的代表性需要多种数据,而有效的探索则只有连贯的表述才可能实现,此外,我们想了解不仅普遍适用各项任务,而且加快下游探索,以开展高效率的具体任务培训。为了应对这些挑战,我们提议了Proto-RL,这是一个自我监督的框架,将代表性学习与通过原型演示进行的探索联系起来。这些原型同时作为代理的探索经验的总结以及代表观察的基础。我们预先对这些任务-不可知性代表性和没有下游任务信息的环境原型进行了培训,从而能够就一系列困难的持续控制任务进行最先进的下游政策学习。