Informally, a `spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter. In machine learning, these have a know-it-when-you-see-it character; e.g., changing the gender of a sentence's subject changes a sentiment predictor's output. To check for spurious correlations, we can `stress test' models by perturbing irrelevant parts of input data and seeing if model predictions change. In this paper, we study stress testing using the tools of causal inference. We introduce \emph{counterfactual invariance} as a formalization of the requirement that changing irrelevant parts of the input shouldn't change model predictions. We connect counterfactual invariance to out-of-domain model performance, and provide practical schemes for learning (approximately) counterfactual invariant predictors (without access to counterfactual examples). It turns out that both the means and implications of counterfactual invariance depend fundamentally on the true underlying causal structure of the data. Distinct causal structures require distinct regularization schemes to induce counterfactual invariance. Similarly, counterfactual invariance implies different domain shift guarantees depending on the underlying causal structure. This theory is supported by empirical results on text classification.
翻译:非正式地说,一个“ 纯正的关联” 是模型对输入数据的某些方面的依赖性, 分析家认为该输入数据是无关紧要的。 在机器学习中, 这些模型具有一个“ 知道- 当- 当- 当- 看- ” 字符的正式化要求; 例如, 改变句子主题的性别会改变情绪预测器的输出。 为了检查虚假的关联性, 我们可以通过扰动输入数据的无关部分来“ 压力测试” 模型, 并查看模型预测是否发生变化。 在本文中, 我们用因果推断工具来研究压力测试。 我们引入了 \ emph{ 相对性变异 }, 以正式化的方式要求改变投入的无关部分不应该改变模型预测。 我们把反事实的变异性与外观模型的性能联系起来, 并提供实用的学习计划( 大约) 反事实变异性预测器( 无法找到反事实的例子 ) 。 事实证明, 反事实变异性的手段和影响从根本上取决于数据的真正根本的因果关系结构。 核心因因果结构需要不同的正变正理论理论的理论保证 。