We propose a novel framework for synthesizing counterfactual treatment group data in a target site by integrating full treatment and control group data from a source site with control group data from the target. Departing from conventional average treatment effect estimation, our approach adopts a distributional causal inference perspective by modeling treatment and control as distinct probability measures on the source and target sites. We formalize the cross-site heterogeneity (effect modification) as a push-forward transformation that maps the joint feature-outcome distribution from the source to the target site. This transformation is learned by aligning the control group distributions between sites using an Optimal Transport-based procedure, and subsequently applied to the source treatment group to generate the synthetic target treatment distribution. Under general regularity conditions, we establish theoretical guarantees for the consistency and asymptotic convergence of the synthetic treatment group data to the true target distribution. Simulation studies across multiple data-generating scenarios and a real-world application to patient-derived xenograft data demonstrate that our framework robustly recovers the full distributional properties of treatment effects.
翻译:本文提出了一种新颖的框架,用于在目标站点合成反事实处理组数据。该框架通过整合源站点的完整处理组与对照组数据以及目标站点的对照组数据实现。与传统平均处理效应估计方法不同,本研究采用分布因果推断视角,将处理组和对照组分别建模为源站点与目标站点上的不同概率测度。我们将跨站点异质性(效应修正)形式化为一个前推变换,该变换将联合特征-结果分布从源站点映射至目标站点。该变换通过基于最优传输的方法对齐站点间的对照组分布进行学习,随后应用于源站点处理组以生成合成目标处理分布。在一般正则性条件下,我们为合成处理组数据向真实目标分布的一致性与渐近收敛性建立了理论保证。跨多种数据生成场景的模拟研究及对患者来源异种移植数据的实际应用表明,本框架能够稳健地恢复处理效应的完整分布特性。