Semantic occupancy has emerged as a powerful representation in world models for its ability to capture rich spatial semantics. However, most existing occupancy world models rely on static and fixed embeddings or grids, which inherently limit the flexibility of perception. Moreover, their ``in-place classification" over grids exhibits a potential misalignment with the dynamic and continuous nature of real scenarios.In this paper, we propose SparseWorld, a novel 4D occupancy world model that is flexible, adaptive, and efficient, powered by sparse and dynamic queries. We propose a Range-Adaptive Perception module, in which learnable queries are modulated by the ego vehicle states and enriched with temporal-spatial associations to enable extended-range perception. To effectively capture the dynamics of the scene, we design a State-Conditioned Forecasting module, which replaces classification-based forecasting with regression-guided formulation, precisely aligning the dynamic queries with the continuity of the 4D environment. In addition, We specifically devise a Temporal-Aware Self-Scheduling training strategy to enable smooth and efficient training. Extensive experiments demonstrate that SparseWorld achieves state-of-the-art performance across perception, forecasting, and planning tasks. Comprehensive visualizations and ablation studies further validate the advantages of SparseWorld in terms of flexibility, adaptability, and efficiency. The code is available at https://github.com/MSunDYY/SparseWorld.
翻译:语义占据凭借其捕获丰富空间语义的能力,已成为世界模型中一种强大的表示方法。然而,现有的大多数占据世界模型依赖于静态且固定的嵌入或网格,这本质上限制了感知的灵活性。此外,它们在网格上进行的“原地分类”与现实场景的动态连续特性存在潜在的不匹配。本文提出SparseWorld,一种新颖的四维占据世界模型,它由稀疏动态查询驱动,具有灵活、自适应和高效的特点。我们提出了一个范围自适应感知模块,其中可学习的查询由自车状态调制,并通过时空关联进行丰富,以实现远距离感知。为了有效捕捉场景的动态特性,我们设计了一个状态条件预测模块,该模块用回归引导的公式取代了基于分类的预测,从而将动态查询与四维环境的连续性精确对齐。此外,我们专门设计了一种时序感知的自调度训练策略,以实现平滑高效的训练。大量实验表明,SparseWorld在感知、预测和规划任务上均达到了最先进的性能。全面的可视化结果和消融研究进一步验证了SparseWorld在灵活性、适应性和效率方面的优势。代码发布于 https://github.com/MSunDYY/SparseWorld。