Federated learning is an appealing framework for analyzing sensitive data from distributed health data networks. Under this framework, data partners at local sites collaboratively build an analytical model under the orchestration of a coordinating site, while keeping the data decentralized. While integrating information from multiple sources may boost statistical efficiency, existing federated learning methods mainly assume data across sites are homogeneous samples of the global population, failing to properly account for the extra variability across sites in estimation and inference. Drawing on a multi-hospital electronic health records network, we develop an efficient and interpretable tree-based ensemble of personalized treatment effect estimators to join results across hospital sites, while actively modeling for the heterogeneity in data sources through site partitioning. The efficiency of this approach is demonstrated by a study of causal effects of oxygen saturation on hospital mortality and backed up by comprehensive numerical results.
翻译:联邦学习是分析分布式卫生数据网络敏感数据的诱人框架。在这个框架下,地方数据伙伴在协调点的统筹下合作建立一个分析模型,同时保持数据分散。综合多种来源的信息可以提高统计效率,但现有的联邦学习方法主要假设不同地点的数据是全球人口同质样本,没有适当说明不同地点在估计和推论方面的差异性。我们利用一个多医院电子健康记录网络,开发了一个高效和可解释的基于树的基于个人化治疗效果估计仪群,以纳入医院各地点的成果,同时通过地点分割积极模拟数据来源的异质性。这一方法的效率表现在对医院死亡率的氧饱和因果效应的研究中,并得到综合数字结果的支持。