We propose a method to distinguish causal influence from hidden confounding in the following scenario: given a target variable Y, potential causal drivers X, and a large number of background features, we propose a novel criterion for identifying causal relationship based on the stability of regression coefficients of X on Y with respect to selecting different background features. To this end, we propose a statistic V measuring the coefficient's variability. We prove, subject to a symmetry assumption for the background influence, that V converges to zero if and only if X contains no causal drivers. In experiments with simulated data, the method outperforms state of the art algorithms. Further, we report encouraging results for real-world data. Our approach aligns with the general belief that causal insights admit better generalization of statistical associations across environments, and justifies similar existing heuristic approaches from the literature.
翻译:我们建议一种方法来区分因果影响和以下情景中隐藏的混乱:考虑到目标变量Y、潜在因果驱动因素X和大量背景特征,我们建议采用新的标准,在选择不同背景特征时,根据X对Y的回归系数在选择Y方面的稳定性来确定因果关系;为此,我们建议采用统计五来衡量系数的变异性。我们根据对背景影响的一种对称假设,证明如果X没有因果驱动因素,V就会接近于零。在模拟数据的实验中,方法优于艺术算法的状态。此外,我们报告真实世界数据的鼓励结果。我们的方法符合一种普遍信念,即因果洞见可以更好地概括各种环境的统计协会,并证明现有的类似理论方法是合理的。