Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of training points. Existing approaches that utilize domain labels to create domain-invariant feature representations are inapplicable in this setting, requiring alternative approaches to learn generalizable classifiers. In this paper, we propose a domain-adaptive approach to this problem, which operates in two steps: (a) we cluster training data within a carefully chosen feature space to create pseudo-domains, and (b) using these pseudo-domains we learn a domain-adaptive classifier that makes predictions using information about both the input and the pseudo-domain it belongs to. Our approach achieves state-of-the-art performance on a variety of domain generalization benchmarks without using domain labels whatsoever. Furthermore, we provide novel theoretical guarantees on domain generalization using cluster information. Our approach is amenable to ensemble-based methods and provides substantial gains even on large-scale benchmark datasets. The code can be found at: https://github.com/xavierohan/AdaClust_DomainBed
翻译:常规化包括从多种多样的培训来源中学习一个分类,从不同的培训来源中学习一个分类,这样可以概括从类似的未知目标领域获得的数据,在大规模学习和个性化推断中加以应用。在许多环境中,隐私问题禁止为培训数据样本获取域标签,而只收集了汇总的培训点。使用域标签创建域异特性表示的现有方法在此环境中不适用,需要以其他方法学习通用分类标准。在本文中,我们建议了一种对该问题的域性适应性方法,它分两个步骤运行:(a) 我们在一个精心选择的特性空间内将培训数据集中起来,以创建假的域域域名;以及(b) 使用这些假的域性域名分类,我们学习一个域性适应分类,利用关于输入和伪域异特性表的信息作出预测。我们的方法在不使用域性通用基准的情况下,在各种域域级通用基准上达到“最先进的”性能。此外,我们用群集信息对域一般化提供新的理论保证。我们的方法可以适用于enemble-Dagle-bal-commas basional basional a labal