We present a novel unsupervised domain adaptation method for semantic segmentation that generalizes a model trained with source images and corresponding ground-truth labels to a target domain. A key to domain adaptive semantic segmentation is to learn domain-invariant and discriminative features without target ground-truth labels. To this end, we propose a bi-directional pixel-prototype contrastive learning framework that minimizes intra-class variations of features for the same object class, while maximizing inter-class variations for different ones, regardless of domains. Specifically, our framework aligns pixel-level features and a prototype of the same object class in target and source images (i.e., positive pairs), respectively, sets them apart for different classes (i.e., negative pairs), and performs the alignment and separation processes toward the other direction with pixel-level features in the source image and a prototype in the target image. The cross-domain matching encourages domain-invariant feature representations, while the bidirectional pixel-prototype correspondences aggregate features for the same object class, providing discriminative features. To establish training pairs for contrastive learning, we propose to generate dynamic pseudo labels of target images using a non-parametric label transfer, that is, pixel-prototype correspondences across different domains. We also present a calibration method compensating class-wise domain biases of prototypes gradually during training.
翻译:我们提出了一个用于语义分解的新颖且不受监督的域域适应方法,该方法将经过源图像和相应的地面真实标签培训的模型一般化为目标域。 域内适应性语义分解的关键是学习无目标地真实标签的域异性和区别性特征。 为此,我们提出了一个双向像素原型对比性学习框架,将同一对象类的特征的类内变化最小化,同时尽可能扩大不同对象类间的变化,而不论区域。 具体地说, 我们的框架将目标和源图像( 即正对配对) 中的同对象类的像素等级特征和一个域原型。 为不同类别( 负对配对) 分别设置域差异性异性和区别性特征, 并用源图像的像素级特征和目标图像的原型来向其他方向进行对齐和分离过程。 我们的交叉匹配鼓励不同对象的域间变异性特征演示, 而双向像类的像系准对应性对应性对应特征和同一对象类( 正面对应性化的比对等) 的标签性化性化性化性化性化性化性化性化性化的校正化性化性化性化性化性图型图像性化性化性化性图像性化性化性化性化性图像学。