While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a causally-motivated balanced mini-batch sampling strategy to transform the observed train distribution to a balanced distribution that is free of spurious correlations. We argue that the Bayes optimal classifier trained on such balanced distribution is minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed underlying data generation process with invariant causal mechanisms, by utilizing enough number of train environments. Experiments are conducted on three domain generalization datasets, demonstrating empirically that our balanced mini-batch sampling strategy improves the performance of four different established domain generalization model baselines compared to the random mini-batch sampling strategy.
翻译:虽然机器学习模型迅速推进关于各种现实世界任务的最新技术,但是由于这些模型容易发生虚假的相关性,外部(OOD)一般化仍是一个具有挑战性的问题。我们提出了一个有因果动机的均衡微型批量抽样战略,将观察到的火车分配转换为均衡的分布,而没有虚假的相关性。我们认为,在这种均衡分布方面受过培训的贝耶斯最佳分类师在足够多样化的环境空间中是最佳的。我们还利用足够数量的火车环境,为拟议基本数据生成过程的潜在变数模型提供了识别性保证。在三个领域进行了实验,从经验上表明,我们平衡的小型批量抽样战略改善了与随机小型批量抽样战略相比,四个不同既定的通用模型基线的性能。