An increasing amount of applications rely on data-driven models that are deployed for perception tasks across a sequence of scenes. Due to the mismatch between training and deployment data, adapting the model on the new scenes is often crucial to obtain good performance. In this work, we study continual multi-scene adaptation for the task of semantic segmentation, assuming that no ground-truth labels are available during deployment and that performance on the previous scenes should be maintained. We propose training a Semantic-NeRF network for each scene by fusing the predictions of a segmentation model and then using the view-consistent rendered semantic labels as pseudo-labels to adapt the model. Through joint training with the segmentation model, the Semantic-NeRF model effectively enables 2D-3D knowledge transfer. Furthermore, due to its compact size, it can be stored in a long-term memory and subsequently used to render data from arbitrary viewpoints to reduce forgetting. We evaluate our approach on ScanNet, where we outperform both a voxel-based baseline and a state-of-the-art unsupervised domain adaptation method.
翻译:越来越多的应用依赖于在一系列场景中部署用于感知任务的数据驱动模型。由于训练和部署数据之间的不匹配,调整模型以适应新场景通常非常重要。研究了连续的多场景适应,假设在部署期间没有地面实况标签,并且应该保持先前场景的性能。我们建议训练每个场景的语义神经辐射图网络,通过融合语义分段模型的预测,并使用视图一致呈现的语义标签作为伪标签来适应模型。通过与分割模型的联合训练,Semantic-NeRF模型有效地实现了2D-3D知识转移。此外,由于其紧凑的大小,它可以存储在长期记忆中,并随后用于从任意视点渲染数据以减少遗忘。在ScanNet上评估了我们的方法,在那里,我们胜过了基于体素的基线和最先进的无监督领域适应方法。