One of the central problems in machine learning is domain adaptation. Unlike past theoretical work, we consider a new model for subpopulation shift in the input or representation space. In this work, we propose a provably effective framework for domain adaptation based on label propagation. In our analysis, we use a simple but realistic expansion assumption, proposed in \citet{wei2021theoretical}. Using a teacher classifier trained on the source domain, our algorithm not only propagates to the target domain but also improves upon the teacher. By leveraging existing generalization bounds, we also obtain end-to-end finite-sample guarantees on the entire algorithm. In addition, we extend our theoretical framework to a more general setting of source-to-target transfer based on a third unlabeled dataset, which can be easily applied in various learning scenarios. Inspired by our theory, we adapt consistency-based semi-supervised learning methods to domain adaptation settings and gain significant improvements.
翻译:机器学习的核心问题之一是领域适应。 与以往的理论工作不同, 我们考虑的是输入或代表空间子人口变化的新模式。 在这项工作中, 我们提议了一个基于标签传播的、 相当有效的域适应框架。 在我们的分析中, 我们使用一个简单而现实的扩展假设, 是在\citet{wei2021神学} 中提议的。 使用在源域上受过培训的教师分类师, 我们的算法不仅向目标域传播, 而且还向教师改进。 通过利用现有的概括界限, 我们还获得了整个算法的端到端的有限抽样保证。 此外, 我们扩展了我们的理论框架, 以基于第三种无标签数据集的源到目标转移为更一般的设置, 这些数据可以很容易地应用于各种学习情景中。 根据我们的理论, 我们调整了基于一致性的半监督的学习方法, 以域适应环境并获得显著的改进。