In this paper, we study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation. Previous works on SFDA mainly focus on aligning the cross-domain distributions. However, they ignore the generalization ability of the pretrained source model, which largely influences the initial target outputs that are vital to the target adaptation stage. To address this, we make the interesting observation that the model accuracy is highly correlated with whether or not attention is focused on the objects in an image. To this end, we propose a generic and effective framework based on Transformer, named TransDA, for learning a generalized model for SFDA. Specifically, we apply the Transformer as the attention module and inject it into a convolutional network. By doing so, the model is encouraged to turn attention towards the object regions, which can effectively improve the model's generalization ability on the target domains. Moreover, a novel self-supervised knowledge distillation approach is proposed to adapt the Transformer with target pseudo-labels, thus further encouraging the network to focus on the object regions. Experiments on three domain adaptation tasks, including closed-set, partial-set, and open-set adaption, demonstrate that TransDA can greatly improve the adaptation accuracy and produce state-of-the-art results. The source code and trained models are available at https://github.com/ygjwd12345/TransDA.
翻译:在本文中,我们研究了无源域适应(SFDA)的任务,在目标适应期间没有源数据。以前关于SFDA的工作主要侧重于协调跨域分布。然而,他们忽视了预先培训的来源模型的普及能力,这在很大程度上影响了对目标适应阶段至关重要的初步目标产出。为了解决这一问题,我们提出有趣的意见,即模型准确性与是否关注图像中的对象高度相关。为此,我们提议了一个基于变压器(名为 TransDA)的通用有效框架,以学习SFDA的普遍模式。具体地说,我们将变压器作为关注模块,将其注入一个革命性网络。通过这样做,鼓励该模型将注意力转向目标区域,从而能够有效地提高模型在目标适应阶段的普及能力。此外,我们提议了一种全新的自我监督知识蒸馏法方法,以目标变压器(称为 TransfordDA)123,从而进一步鼓励网络关注对象区域。在三个域的适应任务上进行实验,包括封闭的、经过培训的TRIVDA/AA 部分调整结果。