We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, network ensembles and style transfer. Instead, we employ standard data augmentation techniques $-$ photometric noise, flipping and scaling $-$ and ensure consistency of the semantic predictions across these image transformations. We develop this principle in a lightweight self-supervised framework trained on co-evolving pseudo labels without the need for cumbersome extra training rounds. Simple in training from a practitioner's standpoint, our approach is remarkably effective. We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios.
翻译:我们提出了一种既实用又高度准确的语义分隔法。与以往的工作不同,我们放弃使用计算上涉及的对抗性目标、网络组合和风格转移。相反,我们采用了标准的数据增强技术,即美元光度噪音、翻转和缩放美元,并确保这些图像转换过程中语义预测的一致性。我们在一个轻量级自我监督的框架内制定这一原则,在无需繁琐的额外培训回合的情况下,对共同演化的假标签进行了培训。从业者的观点看,简单的培训非常有效。我们大幅改进了适应后最先进的语义分隔准确性,在主干结构的不同选择和适应情景之间保持一致。