The network trained for domain adaptation is prone to bias toward the easy-to-transfer classes. Since the ground truth label on the target domain is unavailable during training, the bias problem leads to skewed predictions, forgetting to predict hard-to-transfer classes. To address this problem, we propose Cross-domain Moving Object Mixing (CMOM) that cuts several objects, including hard-to-transfer classes, in the source domain video clip and pastes them into the target domain video clip. Unlike image-level domain adaptation, the temporal context should be maintained to mix moving objects in two different videos. Therefore, we design CMOM to mix with consecutive video frames, so that unrealistic movements are not occurring. We additionally propose Feature Alignment with Temporal Context (FATC) to enhance target domain feature discriminability. FATC exploits the robust source domain features, which are trained with ground truth labels, to learn discriminative target domain features in an unsupervised manner by filtering unreliable predictions with temporal consensus. We demonstrate the effectiveness of the proposed approaches through extensive experiments. In particular, our model reaches mIoU of 53.81% on VIPER to Cityscapes-Seq benchmark and mIoU of 56.31% on SYNTHIA-Seq to Cityscapes-Seq benchmark, surpassing the state-of-the-art methods by large margins. The code is available at: https://github.com/kyusik-cho/CMOM.
翻译:为适应领域而培训的网络容易偏向于容易传输的类。 由于培训期间没有目标域的地面真实标签, 偏差问题导致预测偏斜, 忘记预测难以传输类。 为了解决这个问题, 我们提议跨多边移动对象混合( CMOM), 切除包括难以传输类在内的多个对象, 包括在源域视频剪辑中, 并粘贴到目标域视频剪辑中。 与图像级域适应不同, 时间环境应该维持, 将移动对象混合在两个不同的视频中。 因此, 我们设计 CMOM 来与连续的视频框架混合, 这样就不会出现不切实际的移动。 我们提议与Temoral环境( FATC) 进行功能调整, 以加强目标域特性的可辨别性。 FATC 利用强大的源域特性, 包括硬性转移类, 通过过滤不可靠的预测和时间上的共识, 以不超强的方式学习具有歧视性的目标域域特性。 我们通过广泛的实验来展示拟议方法的有效性。 特别是, 我们的模型在56. 81%的MCM- AS- AS- refionA- referal- regionS