In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.
翻译:在未经监督的域适应(UDA)中,对源数据(如合成)培训的模型进行了调整,以适应目标数据(如真实世界),而没有目标注释。以往多数UDA方法与在目标域上具有类似视觉外观的班级进行斗争,因为没有地面真相可以了解轻微外观差异。为了解决这一问题,我们提议了一个蒙蔽图像一致性(MIC)模块,通过学习目标域的空间背景关系来增强UDA,作为强化视觉识别的额外线索。MIC强制实施对蒙面目标图像(如真实世界)的预测与根据指数移动平均教师完整图像生成的假标签的一致性。为了尽量减少一致性损失,网络必须学会从它们的背景中推断蒙面区域的预测。由于其简单和普遍的概念,MIC可以纳入不同视觉识别任务中的各种UDA方法,例如图像分类、语系间分解和对象检测。MIC显著改进了U-20图像的状态,其中随机补补补补补,以及根据指数生成的假标签,UBVIBA/MDA。 7 和Servial-al-lax-lax-lax-lax-stal-lax-lax-lax-lax-lax-lax-lax-lax-lax-lical-lax-lax