We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the average treatment effects of combined data across sites. Our methods first compute summary statistics locally using propensity scores and then aggregate these statistics across sites to obtain point and variance estimators of average treatment effects. We show that these estimators are consistent and asymptotically normal. To achieve these asymptotic properties, we find that the aggregation schemes need to account for the heterogeneity in treatment assignments and in outcomes across sites. We demonstrate the validity of our federated methods through a comparative study of two large medical claims databases.
翻译:我们有兴趣估计在多个地点对个人适用的治疗的效果,这些地点的数据是在当地储存的,每个地点的数据是在当地储存的。由于隐私的限制,个人一级的数据不能在各地点之间共享;这些地点也可能有不同的人口和治疗分配机制。出于这些考虑,我们开发了联合方法,对各地点综合数据的平均治疗效果进行推断。我们的方法首先利用偏差分分在当地进行汇总统计,然后将这些统计数据汇总到各个地点,以获得平均治疗效果的点数和差异估计数据。我们表明,这些估计数据是一致的,而且无症状的。为了实现这些无症状的特性,我们发现汇总计划需要考虑到各个地点在治疗任务和结果方面的差异性。我们通过对两个大型医疗索赔数据库进行比较研究,来证明我们联合方法的有效性。