World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios like autonomous driving, noncontrollable dynamics that are independent or sparsely dependent on action signals often exist, making it challenging to learn effective world models. To address this issue, we propose Iso-Dream++, a model-based reinforcement learning approach that has two main contributions. First, we optimize the inverse dynamics to encourage the world model to isolate controllable state transitions from the mixed spatiotemporal variations of the environment. Second, we perform policy optimization based on the decoupled latent imaginations, where we roll out noncontrollable states into the future and adaptively associate them with the current controllable state. This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild, such as self-driving cars that can anticipate the movement of other vehicles, thereby avoiding potential risks. On top of our previous work, we further consider the sparse dependencies between controllable and noncontrollable states, address the training collapse problem of state decoupling, and validate our approach in transfer learning setups. Our empirical study demonstrates that Iso-Dream++ outperforms existing reinforcement learning models significantly on CARLA and DeepMind Control.
翻译:世界模型通过学习视觉交互系统中的行动后果来进行预测。然而,在实践场景中,例如自动驾驶中,独立于或稀疏依赖于行动信号的不可控动态经常存在,这使得学习有效的世界模型变得具有挑战性。为了解决这个问题,我们提出了Iso-Dream ++,这是一种基于模型的强化学习方法,有两个主要贡献:首先,我们优化了逆动力学,以鼓励世界模型将可控的状态转换与环境的混合时空变化隔离开。其次,我们根据已解耦的潜在想象进行策略优化,其中我们将不可控状态向前滚动到未来,并自适应地将其与当前可控状态联系起来。这使得长时程的视觉运动控制任务受益于在自动驾驶汽车等野外环境中隔离混合动态来源,从而避免潜在风险。在我们之前的工作基础上,我们进一步考虑了可控状态和不可控状态之间的稀疏依赖关系,解决了状态解耦的训练崩溃问题,并在转移学习设置中验证了我们的方法。我们的实证研究表明,Iso-Dream++在CARLA和DeepMind Control上明显优于现有的强化学习模型。