Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to learning spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using observational data. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations.
翻译:尽管深层次的学习模式推动了一系列广泛任务的最先进的表现,但它们容易学习不应作为预测线索学习的虚假关联。为了减轻这一问题,我们提议了一个基于因果关系的培训框架,以减少可见的困惑者造成的虚假关联。我们对基本的一般结构原因模型(SCM)进行理论分析,并提议进行关于干预分布而不是观察分布的“最大相似性估计”(MLE),即“反现实最大相似性估计”(CMLE)的实验。由于干预分布一般来自观测数据,我们随后从预期的负日志相似性中得出两个不同的上方,并提出两种一般算法,即“隐含 CMLE”和“扩展 CMLE”,以利用观测数据对深层学习模型进行因果关系预测。我们实验了两种真实世界性任务:自然语言推断(NLI)和图像定位。结果显示,CLE方法在正常性业绩评估中比正常性相关性要低,同时减少定期性业绩对比。