Continual Test-Time Adaptation (CTTA) generalizes conventional Test-Time Adaptation (TTA) by assuming that the target domain is dynamic over time rather than stationary. In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation. The key to MM-CTTA is to adaptively attend to the reliable modality while avoiding catastrophic forgetting during continual domain shifts, which is out of the capability of previous TTA or CTTA methods. To fulfill this gap, we propose an MM-CTTA method called Continual Cross-Modal Adaptive Clustering (CoMAC) that addresses this task from two perspectives. On one hand, we propose an adaptive dual-stage mechanism to generate reliable cross-modal predictions by attending to the reliable modality based on the class-wise feature-centroid distance in the latent space. On the other hand, to perform test-time adaptation without catastrophic forgetting, we design class-wise momentum queues that capture confident target features for adaptation while stochastically restoring pseudo-source features to revisit source knowledge. We further introduce two new benchmarks to facilitate the exploration of MM-CTTA in the future. Our experimental results show that our method achieves state-of-the-art performance on both benchmarks.
翻译:连续测试时自适应(CTTA)将目标域视为动态而不是静态的,是传统测试时自适应方法(TTA)的一种推广。本文探讨了多模态连续测试时自适应(MM-CTTA)作为CTTA的新扩展,应用于三维语义分割。MM-CTTA的关键在于在连续的域移动过程中适应于值得信赖的模态,同时避免灾难性遗忘,这是以前的TTA或CTTA方法所无法达到的。为了解决这个问题,我们提出了一种名为连续交叉模态自适应聚类(CoMAC)的MM-CTTA方法,从两方面解决了这个任务。一方面,我们提出了一种自适应的双阶段机制,通过在潜在空间中基于类别特征中心点之间的距离,对可靠的模态作出相应的关注,生成可靠的跨模态预测结果。另一方面,为了在测试时适应而不遗忘,我们设计了类别动量队列,用于捕获置信的目标特征以进行适应,同时随机恢复伪源特征以访问源知识。我们进一步引入了两个新的基准数据集,以便未来探索MM-CTTA。我们的实验结果表明,我们的方法在两个基准数据集上均取得了最先进的性能。