Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL). However, existing auxiliary tasks do not take the characteristics of RL problems into consideration and are unsupervised. By leveraging returns, the most important feedback signals in RL, we propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns. Our auxiliary loss is theoretically justified to learn representations that capture the structure of a new form of state-action abstraction, under which state-action pairs with similar return distributions are aggregated together. In low data regime, our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite, and achieves even better performance when combined with existing auxiliary tasks.
翻译:最近,提出了各种辅助任务,以加快代表性学习和提高深层强化学习(RL)的抽样效率。然而,现有的辅助任务并没有考虑到RL问题的特点,也没有受到监督。通过利用RL中最重要的反馈信号,我们提出了一个新的辅助任务,即通过利用回报(RL中最重要的反馈信号 ), 迫使所学的表达方式对不同回报的州-行动对等进行歧视。 我们的辅助损失在理论上是有道理的。 我们的辅助性损失是为了了解反映新形式州-行动抽象结构的表述,在这种结构下,州-行动对等的回报分布相似。 在低数据制度中,我们的算法在阿塔里游戏和深心灵控制套件的复杂任务上超过了强大的基线,如果与现有的辅助任务相结合,则取得更好的业绩。