Machine learning systems generally assume that the training and testing distributions are the same. To this end, a key requirement is to develop models that can generalize to unseen distributions. Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increasing interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. Great progress has been made in the area of domain generalization for years. This paper presents the first review of recent advances in this area. First, we provide a formal definition of domain generalization and discuss several related fields. We then thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. We categorize recent algorithms into three classes: data manipulation, representation learning, and learning strategy, and present several popular algorithms in detail for each category. Third, we introduce the commonly used datasets, applications, and our open-sourced codebase for fair evaluation. Finally, we summarize existing literature and present some potential research topics for the future.
翻译:机械学习系统一般假定培训和测试分布是相同的。为此目的,关键要求是开发能够概括到无形分布的模式。 域的通用(DG),即分配外的通用,近年来吸引了越来越多的兴趣。 域的一般化(DG)涉及一个具有挑战性的环境,其中给出了一个或几个不同但相互关联的领域,目标是学习一个可以概括到一个无形的测试领域的模型。多年来,在域的通用领域取得了很大进展。本文件首次回顾了该领域的最新进展。首先,我们对域的通用化作了正式定义,并讨论了几个相关领域。然后,我们透彻地审查了与域的通用有关的理论,并仔细分析了一般化背后的理论。我们将最近的算法分为三个类别:数据操作、代表学习和学习战略,并详细介绍了每个类别的几种通用算法。第三,我们介绍了通用的数据集、应用程序和我们用于公平评估的开放源代码库。最后,我们总结了现有的文献,并提出了未来可能研究的专题。