Integrating information from multiple data sources can enable more precise, timely, and generalizable decisions. However, it is challenging to make valid causal inferences using observational data from multiple data sources. For example, in healthcare, learning from electronic health records contained in different hospitals is desirable but difficult due to heterogeneity in patient case mix, differences in treatment guidelines, and data privacy regulations that preclude individual patient data from being pooled. Motivated to overcome these issues, we develop a federated causal inference framework. We devise a doubly robust estimator of the mean potential outcome in a target population and show that it is consistent even when some models are misspecified. To enable real-world use, our proposed algorithm is privacy-preserving (requiring only summary statistics to be shared between hospitals) and communication-efficient (requiring only one round of communication between hospitals). We implement our causal estimation and inference procedure to investigate the quality of hospital care provided by a diverse set of 51 candidate Cardiac Centers of Excellence, as measured by 30-day mortality and length of stay for acute myocardial infarction (AMI) patients. We find that our proposed federated global estimator improves the precision of treatment effect estimates by 59% to 91% compared to using data from the target hospital alone. This precision gain results in qualitatively different conclusions about the estimated effect of percutaneous coronary intervention (PCI) compared to medical management (MM) in 63% (32 of 51) of hospitals. We find that hospitals rarely excel in both PCI and MM, which highlights the importance of assessing performance on specific treatment regimens.
翻译:将多种数据来源的信息综合起来,可以做出更准确、及时和普遍适用的决定。然而,使用多种数据来源的观察数据做出有效的因果推断是具有挑战性的。例如,在医疗保健方面,从不同医院的电子健康记录中学习是可取的,但由于病人情况混合、治疗准则差异和数据隐私条例的差异,使得个人病人数据无法集中在一起,因此难以从多个数据来源获得更准确、更及时和更可概括的信息。为了克服这些问题,我们制定了一个联合因果推断框架。我们设计了一个对目标人口潜在潜在结果的加倍有力的估计,并表明即使在某些模型被错误描述时,这种估计也是一致的。为了能够实现真实世界的使用,我们提议的算法是保密的(只要求医院之间共享简要统计数据)和通信效率(只需要医院之间进行一轮沟通)。我们实施了我们的因果估计和推断程序,以调查51个候选卡迪亚克英才中心提供的医院护理质量。我们用30天死亡率和急性心脏病住院治疗时间长度来衡量,我们用59个医院的准确性评估结果,我们用这一精确性评估了全球病理学结果。我们用59个医院的准确性评估结果,我们用PMMMM的成绩来评估了对19的准确性评估。