Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such an assumption is rarely plausible in the real-world and possibly causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. To avoid accessing source data that may contain sensitive information, we introduce Source data-Free Domain Adaptation (SFDA). Our key idea is to leverage a pre-trained model from the source domain and progressively update the target model in a self-learning manner. We observe that target samples with lower self-entropy measured by the pre-trained source model are more likely to be classified correctly. From this, we select the reliable samples with the self-entropy criterion and define these as class prototypes. We then assign pseudo labels for every target sample based on the similarity score with class prototypes. Furthermore, to reduce the uncertainty from the pseudo labeling process, we propose set-to-set distance-based filtering which does not require any tunable hyperparameters. Finally, we train the target model with the filtered pseudo labels with regularization from the pre-trained source model. Surprisingly, without direct usage of labeled source samples, our PrDA outperforms conventional domain adaptation methods on benchmark datasets. Our code is publicly available at https://github.com/youngryan1993/SFDA-SourceFreeDA
翻译:校内适应假设,源和目标域的样本在培训阶段可以自由获取。然而,在现实世界中,这种假设很少可信,并可能造成数据隐私问题,特别是当源域的标签可能是一个敏感属性作为识别符号时。为避免获取可能包含敏感信息的源数据,我们引入了源数据-无域适应(SFDA) 。我们的关键想法是利用源域预先培训的模型,并逐步以自学的方式更新目标模型。我们观察到,通过预先培训的源模型测量的自成一体程度较低的目标样本更有可能被正确分类。我们从中选择自成一体标准可靠的样本,并将这些样本定义为类原型。我们随后根据类原型的类似性评分为每个目标样本指定假标签。此外,为了减少假标签过程中的不确定性,我们建议采用固定的远程过滤器,不需要任何金枪鱼可选的超参数。最后,我们用过滤的伪标签模型进行分类,从预选的自成品标准/自成型数据库模型中进行正规化处理。我们现有的常规数据库源,在常规数据库中不采用常规源。