World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios such as autonomous driving, there commonly exists noncontrollable dynamics independent of the action signals, making it difficult to learn effective world models. To tackle this problem, we present a novel reinforcement learning approach named Iso-Dream, which improves the Dream-to-Control framework in two aspects. First, by optimizing the inverse dynamics, we encourage the world model to learn controllable and noncontrollable sources of spatiotemporal changes on isolated state transition branches. Second, we optimize the behavior of the agent on the decoupled latent imaginations of the world model. Specifically, to estimate state values, we roll-out the noncontrollable states into the future and associate them with the current controllable state. In this way, the isolation of dynamics sources can greatly benefit long-horizon decision-making of the agent, such as a self-driving car that can avoid potential risks by anticipating the movement of other vehicles. Experiments show that Iso-Dream is effective in decoupling the mixed dynamics and remarkably outperforms existing approaches in a wide range of visual control and prediction domains.
翻译:世界模型在基于愿景的互动系统中学习行动的后果。然而,在诸如自主驱动等实际情景中,通常存在与行动信号无关的无法控制的动态,因此难以学习有效的世界模型。为了解决这一问题,我们提出了名为Iso-Dream的新型强化学习方法,它从两个方面改进了梦想到控制的框架。首先,通过优化反向动态,我们鼓励世界模型学习在孤立的州过渡分支上可控和不可控制的空间变化的来源。第二,我们优化了代理人在脱钩的世界模型潜在想象力上的行为。具体地说,为了估算国家价值,我们将不可控状态推广到未来,并将它们与当前可控状态联系起来。这样,动态源的孤立可以极大地有利于代理人的长期同步决策,例如,通过预测其他飞行器的移动可以避免潜在风险的自我驱动汽车。实验表明,Iso-Dream在将混合的动态和清晰的视野范围外的视觉控制方法中可以有效脱钩。