The dramatically growing availability of observational data is being witnessed in various domains of science and technology, which facilitates the study of causal inference. However, estimating treatment effects from observational data is faced with two major challenges, missing counterfactual outcomes and treatment selection bias. Matching methods are among the most widely used and fundamental approaches to estimating treatment effects, but existing matching methods have poor performance when facing data with high dimensional and complicated variables. We propose a feature selection representation matching (FSRM) method based on deep representation learning and matching, which maps the original covariate space into a selective, nonlinear, and balanced representation space, and then conducts matching in the learned representation space. FSRM adopts deep feature selection to minimize the influence of irrelevant variables for estimating treatment effects and incorporates a regularizer based on the Wasserstein distance to learn balanced representations. We evaluate the performance of our FSRM method on three datasets, and the results demonstrate superiority over the state-of-the-art methods.
翻译:观察数据的可获性在各种科学和技术领域急剧增加,这有利于因果推断的研究,然而,估计观察数据的处理效果面临两大挑战,即缺乏反事实结果和治疗选择偏差。匹配方法是用来估计治疗效果的最广泛和最根本的方法之一,但现有的匹配方法在面对高维度和复杂变量的数据时表现不佳。我们提议基于深度代表性学习和匹配的特征选择代表比比(FSRM)方法,该方法将原始的共变空间映射成选择性、非线性和平衡的代表性空间,然后在学习的代表性空间进行匹配。FSRM采用深度特征选择,以尽量减少不相关的变量对估计治疗效果的影响,并采用基于瓦塞斯坦距离的常规工具来学习均衡的表述。我们评估了我们基于三个数据集的FSRM方法的绩效,其结果显示优于最先进的方法。