The latent world model provides a promising way to learn policies in a compact latent space for tasks with high-dimensional observations, however, its generalization across diverse environments with unseen dynamics remains challenging. Although the recurrent structure utilized in current advances helps to capture local dynamics, modeling only state transitions without an explicit understanding of environmental context limits the generalization ability of the dynamics model. To address this issue, we propose a Prototypical Context-Aware Dynamics (ProtoCAD) model, which captures the local dynamics by time consistent latent context and enables dynamics generalization in high-dimensional control tasks. ProtoCAD extracts useful contextual information with the help of the prototypes clustered over batch and benefits model-based RL in two folds: 1) It utilizes a temporally consistent prototypical regularizer that encourages the prototype assignments produced for different time parts of the same latent trajectory to be temporally consistent instead of comparing the features; 2) A context representation is designed which combines both the projection embedding of latent states and aggregated prototypes and can significantly improve the dynamics generalization ability. Extensive experiments show that ProtoCAD surpasses existing methods in terms of dynamics generalization. Compared with the recurrent-based model RSSM, ProtoCAD delivers 13.2% and 26.7% better mean and median performance across all dynamics generalization tasks.
翻译:潜伏的世界模型为在紧凑的潜在空间里学习政策以开展高层次观测的任务提供了一种充满希望的方法,然而,它仍然具有挑战性。虽然当前进步中所使用的经常性结构有助于捕捉当地动态,但只建模状态过渡而没有对环境背景有明确了解,限制了动态模型的概括能力。为了解决这一问题,我们提议了一个Protocmod 环境软件动态模型,通过时间一致的潜在背景来捕捉当地动态,并使得动态在高层次控制任务中得以概括化。 ProtoCAD在基于批量和效益模型的模型RL两个折叠的原型的帮助下,提取了有用的背景信息:(1) 它使用一种时间上一致的原型常规化工具,鼓励为同一潜在轨迹的不同时间段制作的原型任务,而不是比较特征;(2) 设计了一种背景代表,将预测结合隐含潜伏状态和综合原型的预测,并能够显著提高动态化能力。 广泛的实验显示,ProtoCAD在总体动态中超越了现有方法,在整个SISM2 和整个常规动态化中,将SPR-BAD交付所有常规化。