Driven by the need to generate real-world evidence from multi-site collaborative studies, we introduce an efficient collaborative learning approach to evaluate average treatment effect in a multi-site setting under data sharing constraints. Specifically, the proposed method operates in a federated manner, using individual-level data from a user-defined target population and summary statistics from other source populations, to construct efficient estimator for the average treatment effect on the target population of interest. Our federated approach does not require iterative communications between sites, making it particularly suitable for research consortia with limited resources for developing automated data-sharing infrastructures. Compared to existing work data integration methods in causal inference, it allows distributional shifts in outcomes, treatments and baseline covariates distributions, and achieves semiparametric efficiency bound under appropriate conditions. We illustrate the magnitude of efficiency gains from incorporating extra data sources by examining the effect of insulin vs. non-insulin treatments on heart failure for patients with type II diabetes using electronic health record data collected from the All of Us program.
翻译:暂无翻译