Most causal inference methods focus on estimating marginal average treatment effects, but many important causal estimands depend on the joint distribution of potential outcomes, including the probability of causation and proportions benefiting from or harmed by treatment. Wu et al (2025) recently established nonparametric identification of this joint distribution for categorical outcomes under binary treatment by leveraging variation across multiple studies. We demonstrate that their multi-study framework can be implemented within a single study by using a baseline covariate that is associated with untreated potential outcomes but does not modify treatment effects conditional on those outcomes. This reframing substantially broadens the practical applicability of their results, as it eliminates the need for multiple independent datasets and gives analysts control over covariate selection to satisfy key identifying assumptions. We provide complete identification and estimation theory for the single-study setting, including a Neyman-orthogonal estimator for cases where the conditional independence assumption only holds after adjusting for covariates. However, we argue that only in unusual settings would it be even theoretically possible for the identifying assumptions to hold exactly, making sensitivity analysis particularly important. We validate the estimator in a simulation and apply it to data from a large field experiment assessing the effect of mailings on voter turnout.
翻译:大多数因果推断方法聚焦于估计边际平均处理效应,但许多重要的因果估计量依赖于潜在结果的联合分布,包括因果概率以及从治疗中获益或受害的比例。Wu等人(2025)最近通过利用多个研究间的变异,建立了二元处理下分类结果该联合分布的非参数识别。我们证明,他们的多研究框架可以通过使用一个与未处理潜在结果相关、但在给定这些结果条件下不修饰处理效应的基线协变量,在单一研究中实现。这一重构显著拓宽了其结果的实践适用性,因为它消除了对多个独立数据集的需求,并让分析者能够控制协变量选择以满足关键识别假设。我们为单研究场景提供了完整的识别与估计理论,包括针对条件独立性假设仅在调整协变量后成立情况下的Neyman正交估计量。然而,我们认为只有在特殊场景下,识别假设才可能在理论上严格成立,这使得敏感性分析尤为重要。我们在模拟中验证了该估计量,并将其应用于评估邮件对选民投票率影响的大型实地实验数据中。