The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of domain shift in the case of Lipschitz labeling function. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. Practical algorithms are inferred from the theoretical bounds, one is based on greedy optimization and the other is a K-medoids algorithm. We also provide improved versions of the algorithms to address the case of large data sets. These algorithms are competitive against other state-of-the-art active learning techniques in the context of domain adaptation as shown in our numerical experiments, in particular on large data sets of around one hundred thousand images.
翻译:本文的目标是设计积极的学习策略,在Lipschitz标签功能的假设领域变化的情况下,在假设领域变化的情况下,导致域的适应。在Mansour等人(2009年)以前的工作基础上,我们调整了源和目标分布之间的差异距离概念,将假设类别和目标分布的最大化限制在对源域进行准确标签的本地化功能类别。我们从Rademacher平均值和符合常规条件的一般损失函数的局部差异中得出这种积极学习策略的概括性误差界限。从理论界限中推断出实用算法,其中一项基于贪婪优化,另一项基于K-Medoids算法。我们还提供了改进的算法版本,以处理大型数据集的案例。这些算法与我们数字实验中显示的域适应方面其他最先进的积极学习技术相比,这些算法具有竞争力,特别是大约10万个图像的大型数据集。