In vision-based reinforcement learning (RL) tasks, it is prevalent to assign the auxiliary task with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency. However, abundant information in self-supervised auxiliary tasks has been disregarded, since the representation learning part and the decision-making part are separated. To sufficiently utilize information in the auxiliary task, we present a simple yet effective idea to employ self-supervised loss as an intrinsic reward, called Intrinsically Motivated Self-Supervised learning in Reinforcement learning (IM-SSR). We formally show that the self-supervised loss can be decomposed as exploration for novel states and robustness improvement from nuisance elimination. IM-SSR can be effortlessly plugged into any reinforcement learning with self-supervised auxiliary objectives with nearly no additional cost. Combined with IM-SSR, the previous underlying algorithms achieve salient improvements on both sample efficiency and generalization in various vision-based robotics tasks from the DeepMind Control Suite, especially when the reward signal is sparse.
翻译:在基于愿景的强化学习(RL)任务中,通常会指派辅助任务,代用自我监督的损失,以获得更多的语义表达,提高样本效率;然而,自监督的辅助任务中的大量信息被忽略,因为代用学习部分和决策部分是分开的;为了在辅助任务中充分利用信息,我们提出了一个简单而有效的想法,用自我监督的损失作为内在奖励,称为 " 内在动力驱动自我监督的强化学习(IM-SSR) " 。 我们正式表明,自监督的损失可以随着探索新状态和从清除麻烦中提高稳健性而分解。 代用自监督的辅助目标进行的任何强化学习,几乎不增加费用。 与IM-SSR相结合,以前的原始算法在深敏控制套的各种基于愿景的机器人任务中,特别是在奖励信号不甚明的情况下,在抽样效率和普及方面都取得了显著改进。