All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.
翻译:所有著名的机器学习算法,包括受监督和半受监督的学习工作,在共同假设下都很好地发挥作用:培训和测试数据遵循同样的分布。当分配变化时,大多数统计模型都必须从新收集的数据中重建,对于某些应用而言,这些数据费用昂贵或不可能获得。因此,有必要制定办法,通过利用相关领域现有的数据,并进一步在类似领域使用这些数据,减少获取新标签样本的需求和努力。这产生了一个新的机器学习框架,称为转移学习:一种学习环境,其灵感是人有能力对各项任务之间的知识进行外推,以便更有效地学习。尽管存在大量不同的转移学习情景,但本次调查的主要目标是提供一个概览,说明在特定和可以说最受欢迎的转移学习次领域,即所谓的域适应性。在这个子领域,数据分布假定在培训和测试数据中会发生变化,而学习任务保持不变。我们首次介绍了与领域适应问题有关的现有成果的最新描述,以不同的学习框架为基础,学习了不同的统计框架。