Not having access to compact and meaningful representations is known to significantly increase the complexity of reinforcement learning (RL). For this reason, it can be useful to perform state representation learning (SRL) before tackling RL tasks. However, obtaining a good state representation can only be done if a large diversity of transitions is observed, which can require a difficult exploration, especially if the environment is initially reward-free. To solve the problems of exploration and SRL in parallel, we propose a new approach called XSRL (eXploratory State Representation Learning). On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the maximization objective of a discovery policy. This results in a policy that seeks complex transitions from which the trained models can effectively learn. Our experimental results show that the approach leads to efficient exploration in challenging environments with image observations, and to state representations that significantly accelerate learning in RL tasks.
翻译:众所周知,没有契约和有意义的表述方式会大大增加强化学习的复杂性(RL),因此,在开展强化学习(RL)之前,进行国家代表性学习(SRL)是有用的。然而,只有观察到大量不同的过渡,特别是如果环境最初是免费的,可能需要进行困难的探索,才能获得良好的国家代表性。为了同时解决勘探和SRL问题,我们建议采用名为XSRL(国家代表性学习)的新方法。一方面,它联合学习国家代表性和州过渡估测器,用来从代表处删除无法利用的信息。另一方面,它不断培养反向模式,并在这一模型的预测错误中增加一笔美元分步骤的学习进度奖金,以形成发现政策的最大化目标。这导致一项政策寻求复杂的转变,使经过培训的模型能够有效地学习。我们的实验结果表明,这一方法导致在富有挑战性的环境中以图像观察方式进行高效的探索,并导致在显著加快RL任务的学习。