Domain Generalization (DG) aims to learn models whose performance remains high on unseen domains encountered at test-time by using data from multiple related source domains. Many existing DG algorithms reduce the divergence between source distributions in a representation space to potentially align the unseen domain close to the sources. This is motivated by the analysis that explains generalization to unseen domains using distributional distance (such as the Wasserstein distance) to the sources. However, due to the openness of the DG objective, it is challenging to evaluate DG algorithms comprehensively using a few benchmark datasets. In particular, we demonstrate that the accuracy of the models trained with DG methods varies significantly across unseen domains, generated from popular benchmark datasets. This highlights that the performance of DG methods on a few benchmark datasets may not be representative of their performance on unseen domains in the wild. To overcome this roadblock, we propose a universal certification framework based on distributionally robust optimization (DRO) that can efficiently certify the worst-case performance of any DG method. This enables a data-independent evaluation of a DG method complementary to the empirical evaluations on benchmark datasets. Furthermore, we propose a training algorithm that can be used with any DG method to provably improve their certified performance. Our empirical evaluation demonstrates the effectiveness of our method at significantly improving the worst-case loss (i.e., reducing the risk of failure of these models in the wild) without incurring a significant performance drop on benchmark datasets.
翻译:常规化(DG) 目的是通过使用多个相关源域的数据,学习在测试时遇到的隐蔽域的性能仍然很高的模型。许多现有的DG算法通过使用多个相关源域的数据,缩小了代表空间中源分布的差别,以便有可能将隐蔽域与来源相近。这是由利用分布距离(如瓦瑟斯坦距离)向源源解释一般化为隐蔽域的分析所推动的。然而,由于DG目标的开放性,使用少数基准数据集全面评价DG算法具有挑战性。特别是,我们表明,用DG方法培训的模型的准确性能在各种隐蔽域之间差异很大,这些模型来自流行的基准数据集。这突出表明,在少数基准数据集中,DG方法的性能可能不能代表其在荒野外的隐秘域中的性能。为了克服这一障碍,我们提议了一个基于分布强力优化(DRO)的通用认证框架,可以有效地验证任何DG方法的最坏的性能。这使我们能够对DG方法进行数据依赖评价,从而补充基准数据集的实绩评估。此外,我们提议一种业绩评估方法可以大大地改进。