Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observed confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using observational data. We conduct experiments on both simulated data and two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations.
翻译:尽管深层次的学习模式推动了一系列任务,尽管它们催生了最先进的先进表现,但它们容易产生虚假的关联,而不应将其作为预测性的线索来学习。为了缓解这一问题,我们提议了一个基于因果关系的培训框架,以减少观察到的困惑者造成的虚假关联。我们对基本的一般结构原因模型(SCM)进行理论分析,并提议用观察数据对深层次学习模型进行因果关系预测。我们进行模拟数据和两种真实世界任务(即自然语言推断和图像定位)的实验,结果显示CMLE方法一般从观察数据中隐藏了两种不同的干预分布,我们随后得出了预期负面日志相似性的两个不同的上层界限,并提出了两种一般算法,即隐含性CMLE和扩展性 CMLE,用观察数据对深层次学习模型进行因果关系预测。我们对模拟数据和两种真实世界任务(即自然语言推断和图像定位)进行实验。结果显示,在定期业绩评估的同时,CMLE方法比正常的比性,同时保持了一般业绩评估的比性。