Unsupervised domain adaptation (UDA) aims to bridge the domain shift between the labeled source domain and the unlabeled target domain. However, most existing works perform the global-level feature alignment for semantic segmentation, while the local consistency between the regions has been largely neglected, and these methods are less robust to changing of outdoor environments. Motivated by the above facts, we propose a novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation. Our core idea is to pull the similar regional features extracted from the same location of different images to be closer, and meanwhile push the features from the different locations of the two images to be separated. We innovatively propose momentum projector heads, where the teacher projector is the exponential moving average of the student. Besides, we present a region-wise contrastive loss with two sampling strategies to realize effective regional consistency. Finally, a memory bank mechanism is designed to learn more robust and stable region-wise features under varying environments. Extensive experiments on two common UDA benchmarks, i.e., GTAV to Cityscapes and SYNTHIA to Cityscapes, demonstrate that our approach outperforms the state-of-the-art methods.
翻译:不受监督的域适应(UDA)旨在弥合标签源域和未标记的目标域域之间的域变。然而,大多数现有工程都对语义分割进行全球层面的特征调整,而各区域之间的本地一致性在很大程度上被忽略,这些方法对户外环境的变化并不那么强大。受上述事实的驱动,我们提出了一个创新的、完全端到端的、可训练的方法,称为区域差异性一致性规范(RCCR),用于域适应性语义分割。我们的核心思想是将不同图像在同一地点提取的类似区域特征拉近,同时将两种图像的不同地点的特征推离。我们创新地提出了动力投影仪头,教师投影机是学生的指数移动平均数。此外,我们提出了一种对区域有影响的对比性损失,用两种抽样战略来实现有效的区域一致性。最后,一个记忆库机制旨在在不同环境中学习更稳健和稳定的区域视角特征。在两种共同的UDA基准上进行广泛的实验,即GTAVA到城市的景观和SISHIA方法向城市展示。