The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. A practical K-medoids algorithm that can address the case of large data set is inferred from the theoretical bounds. Our numerical experiments show that the proposed algorithm is competitive against other state-of-the-art active learning techniques in the context of domain adaptation, in particular on large data sets of around one hundred thousand images.
翻译:本文的目标是设计积极的学习战略,在Lipschitz 函数假设下导致领域适应。在Mansour等人(2009年)以前的工作基础上,我们调整了源和目标分布之间差异距离的概念,将假设类别和目标分布的最大化限制在对源域进行准确标签的局部功能类别。我们从Rademacher 平均和符合正常状态的一般损失函数局部差异的角度,得出了这种积极学习战略的概括性错误界限。从理论界限中推断出能够处理大数据集案例的实用K-Medoid算法。我们的数字实验表明,拟议的算法在区域适应方面,特别是在大约10万张图像的大型数据集方面,与其他最先进的主动学习技术相比具有竞争力。