Causal inference concerns not only the average effect of the treatment on the outcome but also the underlying mechanism through an intermediate variable of interest. Principal stratification characterizes such mechanism by targeting subgroup causal effects within principal strata, which are defined by the joint potential values of an intermediate variable. Due to the fundamental problem of causal inference, principal strata are inherently latent, rendering it challenging to identify and estimate subgroup effects within them. A line of research leverages the principal ignorability assumption that the latent principal strata are mean independent of the potential outcomes conditioning on the observed covariates. Under principal ignorability, we derive various nonparametric identification formulas for causal effects within principal strata in observational studies, which motivate estimators relying on the correct specifications of different parts of the observed-data distribution. Appropriately combining these estimators further yields new triply robust estimators for the causal effects within principal strata. These new estimators are consistent if two of the treatment, intermediate variable, and outcome models are correctly specified, and they are locally efficient if all three models are correctly specified. We show that these estimators arise naturally from either the efficient influence functions in the semiparametric theory or the model-assisted estimators in the survey sampling theory. We evaluate different estimators based on their finite-sample performance through simulation, apply them to two observational studies, and implement them in an open-source software package.
翻译:因果关系推断不仅涉及处理结果对结果的平均影响,而且涉及通过中间利益变数确定的基本机制。主要分层通过针对主要阶层内分组因果效应确定主要阶层,这些效应由中间变数的共同潜在值确定。由于因果关系推论的根本问题,主要阶层具有内在潜伏性,因此难以确定和估计其内部的分层效应。一线研究利用主要可忽略性假设,即潜本层与观察到的共变数的潜在结果的调节条件基本无关。在主要的可忽略性下,我们在观测研究的主要阶层内得出各种非参数性确定因果关系的公式,这些公式激励根据观察到的数据分布不同部分的正确规格进行估计的估算。由于这些估计因素在主要阶层内产生新的三重性强估计。如果对两种处理、中间变量和结果模型作出正确的说明,这些新的估计是一致的,如果所有三种模型都得到正确说明,这些估计公式在本地是有效的。我们显示,这些估量公式自然产生于基于观察到的数据分配数据分布的不同部分的精确性,通过基于不同模型的理论的模拟研究,通过不同的精确的模拟研究,对它们进行精确的精确性评估。