Latent variables often mask cause-effect relationships in observational data which provokes spurious links that may be misinterpreted as causal. This problem sparks great interest in the fields such as climate science and economics. We propose to estimate confounded causal links of time series using Sequential Causal Effect Variational Autoencoder (SCEVAE) while applying Knockoff interventions. Knockoff variables have the same distribution as the originals and preserve the correlation to other variables. This allows for counterfactuals that are more faithful to the observational distribution. We show the advantage of Knockoff interventions by applying SCEVAE to synthetic datasets with both linear and nonlinear causal links. Moreover, we apply SCEVAE with Knockoffs to real aerosol-cloud-climate observational time series data. We compare our results on synthetic data to those of a time series deconfounding method both with and without estimated confounders. We show that our method outperforms this benchmark by comparing both methods to the ground truth. For the real data analysis, we rely on expert knowledge of causal links and demonstrate how using suitable proxy variables improves the causal link estimation in the presence of hidden confounders.
翻译:隐藏的变量往往掩盖了观测数据中的因果关系,而观测数据中出现虚假的联系,而这种联系可能被误解为因果关系。这个问题在气候科学和经济学等领域引起了极大的兴趣。我们提议在应用Knoff 干预时使用Squestial Causal 效果变异自动编码器(SCEVAE)来估计时间序列的因果联系。 关闭变量的分布与原始变量相同,并保持与其他变量的关联性。 允许对观测分布更加忠实的反效果。 我们通过将 SCEVAE 应用于具有线性和非线性因果关系的合成数据集来显示“ 决裂” 干预措施的优势。 此外, 我们用 SCEVAE 和 Knockoffs 来计算真实的气溶胶球球气候观测时间序列数据。 我们将合成数据的结果与时间序列的分解方法相比较, 与不估算者比较, 并且不估计断裂者。 我们显示我们的方法比两种方法都不符合这个基准。 我们用两种方法来比较地面的真相分析, 我们依靠对因果关系的专家知识, 并演示如何利用隐藏的代理变量来改进因果关系。