Machine learning algorithms minimizing the average training loss usually suffer from poor generalization performance due to the greedy exploitation of correlations among the training data, which are not stable under distributional shifts. It inspires various works for domain generalization (DG), where a series of methods, such as Causal Matching and FISH, work by pairwise domain operations. They would need $O(n^2)$ pairwise domain operations with $n$ domains, where each one is often highly expensive. Moreover, while a common objective in the DG literature is to learn invariant representations against domain-induced spurious correlations, we highlight the importance of mitigating spurious correlations caused by objects. Based on the observation that diversity helps mitigate spurious correlations, we propose a Diversity boosted twO-level saMplIng framework (DOMI) utilizing Determinantal Point Processes (DPPs) to efficiently sample the most informative ones among large number of domains. We show that DOMI helps train robust models against spurious correlations from both domain-side and object-side, substantially enhancing the performance of the backbone DG algorithms on rotated MNIST, rotated Fashion MNIST, and iwildcam datasets.
翻译:机械学习算法将平均培训损失减少到最低程度,通常会由于培训数据之间的相关性的贪婪利用而导致一般化表现不佳,而这种数据在分布式转变中并不稳定。它激发了各种通用域(DG)的作品,例如Causal Match和FISH等一系列方法,通过对称域操作进行。它们需要一对一的双向域操作,使用美元域,每个域通常费用很高。此外,DG文献的一个共同目标是学习对域与域之间由域导致的虚假关联的反常表达,我们强调减少由对象引起的虚假关联的重要性。基于多样性有助于减少虚假关联的观察,我们提议采用多样性促进三维O级Sabrig框架(DOMI),利用阻感应点进程(DPPs),在众多域中高效地抽样信息最为丰富的域。我们显示DOMMI帮助培养了强大的模型,防止域与域与对象之间的虚假关联,大大加强了主干GDG在旋转的ISMISMIS、旋转磁盘和磁盘上的数据转换。