Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, our experiments also show that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).
翻译:强化学习( RL) 代理器通常无法对培训期间未观察到的状态空间的环境变化进行概括化。 这个问题对于基于图像的 RL 来说特别成问题, 仅仅一个变量的改变, 如背景颜色, 可以改变图像中的许多像素。 改变像素可以导致代理器对图像潜在代表的急剧变化, 导致学习的政策失败。 为了学习更强有力的表达方式, 我们引入了时间分解( TED), 这是一种自我监督的辅助任务, 导致利用 RL 观测的顺序性质来解析相交的图像表达方式。 我们从经验中发现, RL 算法将TED 作为一种辅助任务, 与最新代表学习方法相比, 继续培训可以更快地适应环境变量的变化 。 由于TED 执行了一个不相交织的结构, 我们的实验还表明, TED 培训过的政策能够更好地了解与任务无关的变量的看不见值( 如背景颜色), 以及影响最佳政策( 如目标位置) 的不可见的变量值 。</s>