We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint. Inspired by recent analysis on a critical issue from label distribution mismatch between source and target in domain adaptation, we devise a method that addresses the issue for the first time in ADA. At its heart lies a novel sampling strategy, which seeks target data that best approximate the entire target distribution as well as being representative, diverse, and uncertain. The sampled target data are then used not only for supervised learning but also for matching label distributions of source and target domains, leading to remarkable performance improvement. On four public benchmarks, our method substantially outperforms existing methods in every adaptation scenario.
翻译:我们考虑了主动域适应(ADA)对无标签目标数据的问题,其中子集被积极选定,并被贴上标签,因为预算有限制。根据最近对域适应中源和目标之间标签分布不匹配这一关键问题的分析,我们设计了一种首次在ADA中解决这一问题的方法。其核心是一种新的抽样战略,它寻求最接近整个目标分布并具有代表性、多样性和不确定性的目标数据。然后,抽样目标数据不仅用于监督学习,而且用于匹配源和目标领域的标签分布,从而导致显著的绩效改进。 在四个公共基准上,我们的方法大大优于每个适应情景中的现有方法。