We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the average treatment effects of combined data across sites. Our methods first compute summary statistics locally using propensity scores and then aggregate these statistics across sites to obtain point and variance estimators of average treatment effects. We show that these estimators are consistent and asymptotically normal. To achieve these asymptotic properties, we find that the aggregation schemes need to account for the heterogeneity in treatment assignments and in outcomes across sites. We demonstrate the validity of our federated methods through a comparative study of two large medical claims databases.
翻译:我们的研究旨在估计在多个站点应用治疗措施的效果,其中每个站点都存储本地数据,由于隐私约束,无法跨站点共享个人级别数据;站点可能还具有异构的人群和治疗分配机制。受这些考虑的启发,我们开发了联邦方法,以从跨站点的组合数据上进行平均治疗效果推断。我们的方法首先使用倾向得分在本地计算汇总统计数据,然后跨站点聚合这些统计数据,以获得平均治疗效果的点估计值和方差估计值。我们证明这些估计值是一致且渐进正态的。为了实现这些渐近性质,我们发现聚合方案需要考虑跨站点治疗分配和结果的异质性。我们通过比较两个大型医疗索赔数据库的研究证明了我们的联邦方法的有效性。