Causal inference is a critical research area with multi-disciplinary origins and applications, ranging from statistics, computer science, economics, psychology to public health. In many scientific research, randomized experiments provide a golden standard for estimation of causal effects for decades. However, in many situations, randomized experiments are not feasible in practice so that practitioners need to rely on empirical investigation for causal reasoning. Causal inference via observational data is a challenging task since the knowledge of the treatment assignment mechanism is missing, which typically requires non-testable assumptions to make the inference possible. For several years, great effort has been devoted to the research of causal inference for binary treatments. In practice, it is also common to use observational data on multiple treatment comparisons. Within the potential outcomes framework, we propose a generalized cross-fitting estimator (GCF), which generalizes the doubly robust estimator with cross-fitting for binary treatment to multiple treatment comparisons and provides rigorous proofs on its statistical properties. This estimator permits the use of more flexible machine learning methods to model the nuisance parts, and based on relatively weak assumptions, while there is still a theoretical guarantee for valid statistical inference. We show the asymptotic properties of the GCF estimators, and provide the asymptotic simultaneous confidence intervals that achieve the semiparametric efficiency bound for average treatment effect. The performance of the estimator is accessed through simulation study based on the common evaluation metrics generally considered in the causal inference literature.
翻译:因果关系推断是一个关键的研究领域,具有多学科的起源和应用,从统计、计算机科学、经济学、心理学到公共卫生等,从统计、计算机科学、经济学、心理学到公共卫生,都是一个关键的研究领域。在许多科学研究中,随机实验为估计数十年来因果关系提供了一个黄金标准。然而,在许多情况下,随机实验在实践中并不可行,因此从业人员需要依靠经验调查来推理因果关系。通过观察数据推断是一个具有挑战性的任务,因为缺乏对治疗分配机制的了解,这通常需要无法检验的假设才能得出推断结果。几年来,随机实验为二元治疗的因果关系推断进行了大量研究。在实践中,使用多类治疗比较的观察数据也是常见的。在潜在结果框架内,我们建议采用一个普遍的交叉估计结果(GCF),该结果概括了一种双重的精确估计方法,将二元治疗方法与多重治疗的比较相匹配,并提供精确的统计特性证据。这一估算结果允许使用更灵活的机械学习方法来模拟二元治疗中的因果推断部分。在一般情况下,根据相对薄弱的理论性假设,我们用一个可靠的统计性估测度的精确性估算结果,同时显示统计性能的准确的精确性评估。