Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome framework with unconfoundedness, we establish a bidirectional transformation between the high-dimensional confounders space and a low-dimensional latent space where the density is known (e.g., multivariate normal distribution). Through this, CausalEGM simultaneously decouples the dependencies of confounders on both treatment and outcome and maps the confounders to the low-dimensional latent space. By conditioning on the low-dimensional latent features, CausalEGM can estimate the causal effect for each individual or the average causal effect within a population. Our theoretical analysis shows that the excess risk for CausalEGM can be bounded through empirical process theory. Under an assumption on encoder-decoder networks, the consistency of the estimate can be guaranteed. In a series of experiments, CausalEGM demonstrates superior performance over existing methods for both binary and continuous treatments. Specifically, we find CausalEGM to be substantially more powerful than competing methods in the presence of large sample sizes and high dimensional confounders. The software of CausalEGM is freely available at https://github.com/SUwonglab/CausalEGM.
翻译:虽然在观察性研究中理解和描述因果效应已成为必需,但在混杂因素具有高维度时是具有挑战的。在本文中,我们开发了一个基于生成式建模的通用框架CausalEGM,用于估计因果效应,可适用于二元和连续处理设置。在未混杂框架下,我们建立了混杂因素高维空间和一个低维潜在空间(例如多元正态分布)之间的双向转换。通过这一方法,CausalEGM同时解耦了混杂因素对处理和结果的依赖,并将混杂因素映射到低维潜在空间。通过对低维潜在特征进行条件控制,CausalEGM可以估计每个个体或人群中的平均因果效应。我们的理论分析表明,通过实证过程理论,可以限制CausalEGM的超额风险。在对编码器-解码器网络的假设下,可以保证估计的一致性。在一系列实验中,CausalEGM在二元和连续处理方案中均显示出优越性能。具体而言,我们发现在样本量较大和混杂因素具有高维度的情况下,CausalEGM比其他方法更具有实际意义。CausalEGM的软件可在https://github.com/SUwonglab/CausalEGM免费下载。