One of the central problems in machine learning is domain adaptation. Unlike past theoretical work, we consider a new model for subpopulation shift in the input or representation space. In this work, we propose a provably effective framework for domain adaptation based on label propagation. In our analysis, we use a simple but realistic ``expansion'' assumption, proposed in \citet{wei2021theoretical}. Using a teacher classifier trained on the source domain, our algorithm not only propagates to the target domain but also improves upon the teacher. By leveraging existing generalization bounds, we also obtain end-to-end finite-sample guarantees on the entire algorithm. In addition, we extend our theoretical framework to a more general setting of source-to-target transfer based on a third unlabeled dataset, which can be easily applied in various learning scenarios.
翻译:机器学习的核心问题之一是领域适应。 与以往的理论工作不同, 我们考虑的是输入或代表空间子人口变化的新模式。 在这项工作中, 我们提出一个基于标签传播的、 可能有效的域适应框架。 在我们的分析中, 我们使用一个简单但现实的“ 扩展” 假设, 在\\ citet{wei2021神话学} 中提出 。 我们使用在源域上受过训练的教师分类器, 我们的算法不仅向目标领域传播, 而且还改进了教师。 通过利用现有的通用界限, 我们还获得了整个算法的端到端的有限抽样保证。 此外, 我们扩展了我们的理论框架, 以基于第三个未加标签数据集的源到目标转移为更一般的设置, 这可以在各种学习情景中容易应用。