Analyzing data from multiple sources offers valuable opportunities to improve the estimation efficiency of causal estimands. However, this analysis also poses many challenges due to population heterogeneity and data privacy constraints. While several advanced methods for causal inference in federated settings have been developed in recent years, many focus on difference-based averaged causal effects and are not designed to study effect modification. In this study, we introduce a novel targeted-federated learning framework to study the heterogeneity of treatment effects (HTEs) for a targeted population by proposing a projection-based estimand. This HTE framework integrates information from multiple data sources without sharing raw data, while accounting for covariate distribution shifts among sources. Our proposed approach is shown to be doubly robust, conveniently supporting both difference-based estimands for continuous outcomes and odds ratio-based estimands for binary outcomes. Furthermore, we develop a communication-efficient bootstrap-based selection procedure to detect non-transportable data sources, thereby enhancing robust information aggregation without introducing bias. The superior performance of the proposed estimator over existing methods is demonstrated through extensive simulation studies, and the utility of our approach has been shown in a real-world data application using nationwide Medicare-linked data.
翻译:分析多源数据为提高因果估计量的估计效率提供了宝贵机会。然而,由于人群异质性和数据隐私限制,这种分析也带来了诸多挑战。尽管近年来已开发出多种用于联邦环境下因果推断的先进方法,但许多方法聚焦于基于差异的平均因果效应,并未设计用于研究效应修饰。在本研究中,我们通过提出一种基于投影的估计量,引入了一种新颖的目标联邦学习框架,用于研究目标人群的治疗效应异质性。该HTE框架整合了来自多个数据源的信息而无需共享原始数据,同时考虑了数据源间的协变量分布偏移。我们提出的方法被证明具有双重稳健性,可便捷地支持连续结局的基于差异的估计量以及二分类结局的基于比值比的估计量。此外,我们开发了一种基于通信高效的bootstrap选择程序,以检测不可迁移的数据源,从而在不引入偏倚的情况下增强稳健的信息聚合。通过广泛的模拟研究,证明了所提估计量相较于现有方法的优越性能,并且我们方法的实用性已在一个使用全国性Medicare关联数据的真实世界数据应用中得到展示。