Recent empirical studies on domain generalization (DG) have shown that DG algorithms that perform well on some distribution shifts fail on others, and no state-of-the-art DG algorithm performs consistently well on all shifts. Moreover, real-world data often has multiple distribution shifts over different attributes; hence we introduce multi-attribute distribution shift datasets and find that the accuracy of existing DG algorithms falls even further. To explain these results, we provide a formal characterization of generalization under multi-attribute shifts using a canonical causal graph. Based on the relationship between spurious attributes and the classification label, we obtain realizations of the canonical causal graph that characterize common distribution shifts and show that each shift entails different independence constraints over observed variables. As a result, we prove that any algorithm based on a single, fixed constraint cannot work well across all shifts, providing theoretical evidence for mixed empirical results on DG algorithms. Based on this insight, we develop Causally Adaptive Constraint Minimization (CACM), an algorithm that uses knowledge about the data-generating process to adaptively identify and apply the correct independence constraints for regularization. Results on fully synthetic, MNIST, small NORB, and Waterbirds datasets, covering binary and multi-valued attributes and labels, show that adaptive dataset-dependent constraints lead to the highest accuracy on unseen domains whereas incorrect constraints fail to do so. Our results demonstrate the importance of modeling the causal relationships inherent in the data-generating process.
翻译:最近对领域一般化(DG)的实证研究显示,在有些分布变化方面表现良好的DG算法在有些分布变化方面失灵,而没有最新的DG算法在所有变化方面始终表现良好。此外,真实世界数据往往在不同属性上出现多重分布变化;因此,我们引入了多属性分布转移数据集,发现现有的DG算法的准确性甚至进一步下降。为了解释这些结果,我们用一个罐头因果图表对多属性转移下的一般化作了正式定性。根据虚假属性和分类标签之间的关系,我们实现了所有变化都具有特点的可理解性因果关系。此外,我们实现了一个能说明共同分布变化特点并显示每个变化都涉及不同属性的独立性限制的不同属性;因此,我们证明基于单一、固定的限制,任何基于单一、固定的分布变化的数据集不可能在所有变化中发挥作用,为关于DG算法的混合经验结果提供理论证据。基于这一洞察,我们开发了一种将可适应性控制最小性最小性最小化(CACM)的算法,它利用数据生成过程的知识来适应并应用精确的内在因果性因果性因果关系关系,将精确的因果性因果性关系关系显示了共同分配的因果关系关系关系关系关系,对所观察到的、对所观察到的、对所观察到的精确性数据调整的数据特性的特性的特性的特性的特性的特性,从而显示的精确性数据调节性、结果的、结果的、显示的精确性、结果性、结果的、显示的、对结果的、对等数据调整性、对等数据的精确性、对结果的精确性能性、对结果的精确性、结果性、结果性、对准性能性能性能性能性能性能的精确性能性能、对等。