In this work, we propose CLUDA, a simple, yet novel method for performing unsupervised domain adaptation (UDA) for semantic segmentation by incorporating contrastive losses into a student-teacher learning paradigm, that makes use of pseudo-labels generated from the target domain by the teacher network. More specifically, we extract a multi-level fused-feature map from the encoder, and apply contrastive loss across different classes and different domains, via source-target mixing of images. We consistently improve performance on various feature encoder architectures and for different domain adaptation datasets in semantic segmentation. Furthermore, we introduce a learned-weighted contrastive loss to improve upon on a state-of-the-art multi-resolution training approach in UDA. We produce state-of-the-art results on GTA $\rightarrow$ Cityscapes (74.4 mIOU, +0.6) and Synthia $\rightarrow$ Cityscapes (67.2 mIOU, +1.4) datasets. CLUDA effectively demonstrates contrastive learning in UDA as a generic method, which can be easily integrated into any existing UDA for semantic segmentation tasks. Please refer to the supplementary material for the details on implementation.
翻译:在这项工作中,我们建议CLUDA(CLUDA),这是一个简单而新颖的方法,用于进行不受监督的域适应(UDA),用于语义分解,将对比性损失纳入师生学习范式,使用教师网络的目标领域产生的假标签。更具体地说,我们从编码器中提取一个多级的导体性能图,通过图像的源目标混合,在不同类别和不同领域应用对比性损失。我们不断提高各种特征编码器结构的性能和语义分解中不同域适应数据集的性能。此外,我们引入了一种经加权的对比性损失,以便在UDA中改进最先进的多分辨率培训方法。我们在GTA $\rightrowrorfes(7.4 mIOU,+0.6) 和Synthia $rightrowrooles(67.2 mIO,+1.4) 数据集。CLUDA有效地展示了UUDA的对比性学习,作为通用方法,可以很容易地将任何现有的数据集成成成。