In real-world robotics applications, Reinforcement Learning (RL) agents are often unable to generalise to environment variations that were not observed during training. This issue is intensified for image-based RL where a change in one variable, such as the background colour, can change many pixels in the image, and in turn can change all values in the agent's internal representation of the image. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations. We find empirically that RL algorithms with TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Due to the disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).
翻译:在现实世界的机器人应用中,加强学习(RL)代理商往往无法概括培训期间没有观察到的环境变化。这个问题在基于图像的 RL 中更加突出,因为一个变量(如背景颜色)的变化可以改变图像中的许多像素,反过来又可以改变图像中代理商内部代表的所有值。为了学习更强健的表达方式,我们引入了时间分解(TED),这是一个自我监督的辅助任务,通过RL观测的顺序性质导致分解的表达方式。我们从经验上发现,以TED作为辅助任务的RL算法能够更迅速地适应环境变量的变化,而与最新代表学习方法相比,这种变化是不断的培训。由于代表结构的分解,我们还发现,通过TED培训的政策能够更好地了解与任务无关的变量的无形价值(如背景颜色)以及影响最佳政策的变量的无形价值(如目标位置)。