Heterogeneous unsupervised domain adaptation (HUDA) is the most challenging domain adaptation setting where the feature space differs between source and target domains, and the target domain has only unlabeled data. Existing HUDA methods assume that both positive and negative examples are available in the source domain, which may not be satisfied in some real applications. This paper addresses a new challenging setting called positive and unlabeled heterogeneous domain adaptation (PU-HDA), a HUDA setting where the source domain only has positives. PU-HDA can also be viewed as an extension of PU learning where the positive and unlabeled examples are sampled from different domains. A naive combination of existing HUDA and PU learning methods is ineffective in PU-HDA due to the gap in label distribution between the source and target domains. To overcome this issue, we propose a novel method, positive-adversarial domain adaptation (PADA), which can predict likely positive examples from the unlabeled target data and simultaneously align the feature spaces to reduce the distribution divergence between the whole source data and the likely positive target data. PADA achieves this by a unified adversarial training framework for learning a classifier to predict positive examples and a feature transformer to transform the target feature space to that of the source. Specifically, they are both trained to fool a common discriminator that determines whether the likely positive examples are from the target or source domain. We experimentally show that PADA outperforms several baseline methods, such as the naive combination of HUDA and PU learning.
翻译:本文针对一种新的挑战性设置——正样本未标记的异构领域自适应(PU-HDA),即源域仅有正样本且目标域只有未标记样本的异构无监督领域自适应,这也可以看作是一种PU学习的推广,其中正样本和未标记样本来自不同的域。现有的HUDA方法假设源域中有正样本和负样本,这不一定适用于某些真实应用。本文提出了一种新的方法,正对抗领域自适应(PADA),来克服PU-HDA中源域和目标域标签分布差异的问题。PADA方法可以从目标数据中预测可能的正样本,并通过统一的对抗训练框架学习分类器来预测正样本,以及特征转换器来将目标域特征空间与源域对齐,使整个源域数据与可能的正样本目标数据之间的分布差异降低。具体来说,它们都被训练为欺骗相同的鉴别器,该鉴别器确定可能的正样本是来自目标域还是源域。我们进行实验表明,PADA方法优于几种基准方法,如HUDA和PU学习的朴素组合。