In vision-based reinforcement learning (RL) tasks, it is prevalent to assign auxiliary tasks with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency. However, abundant information in self-supervised auxiliary tasks has been disregarded, since the representation learning part and the decision-making part are separated. To sufficiently utilize information in auxiliary tasks, we present a simple yet effective idea to employ self-supervised loss as an intrinsic reward, called Intrinsically Motivated Self-Supervised learning in Reinforcement learning (IM-SSR). We formally show that the self-supervised loss can be decomposed as exploration for novel states and robustness improvement from nuisance elimination. IM-SSR can be effortlessly plugged into any reinforcement learning with self-supervised auxiliary objectives with nearly no additional cost. Combined with IM-SSR, the previous underlying algorithms achieve salient improvements on both sample efficiency and generalization in various vision-based robotics tasks from the DeepMind Control Suite, especially when the reward signal is sparse.
翻译:在基于愿景的强化学习(RL)任务中,通常会指派具有代用自我监督损失的辅助任务,以便获得更多的语义表,提高样本效率;然而,由于自我监督的辅助任务中的大量信息被忽略,因为代用学习部分和决策部分是分开的;为了在辅助任务中充分利用信息,我们提出了一个简单而有效的想法,即利用自我监督的损失作为内在奖励,称为 " 内在动力驱动自我监督的加强学习自我监督学习(IM-SSR) " 。 我们正式表明,自我监督的损失可以被解脱,作为探索新状态和从消除麻烦中加强性改进的强化学习,因为代用自我监督的辅助目标几乎不增加费用。与IM-SSR相结合,以前的基本算法在深敏化控制套件的各种基于愿景的机器人任务中的抽样效率和普及方面都取得了显著的改进,特别是当奖励信号很少的时候。