3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal task for autonomous driving, as it involves predicting per-voxel occupancy within a 3D scene from partial LiDAR or image inputs. Existing methods primarily focus on the voxel-wise feature aggregation, while neglecting the instance-centric semantics and broader context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts) for SSC, which completes the scene volume from a sparse set of instance queries derived from the input with context awareness. By incorporating the queries as the instance feature representations within the scene, Symphonies dynamically encodes the instance-centric semantics to interact with the image and volume features while avoiding the dense voxel-wise modeling. Simultaneously, it orchestrates a more comprehensive understanding of the scenario by capturing context throughout the entire scene, contributing to alleviating the geometric ambiguity derived from occlusion and perspective errors. Symphonies achieves a state-of-the-art result of 13.02 mIoU on the challenging SemanticKITTI dataset, outperforming existing methods and showcasing the promising advancements of the paradigm. The code is available at \url{https://github.com/hustvl/Symphonies}.
翻译:暂无翻译