Accurately estimating personalized treatment effects within a study site (e.g., a hospital) has been challenging due to limited sample size. Furthermore, privacy considerations and lack of resources prevent a site from leveraging subject-level data from other sites. We propose a tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) at a target site by leveraging models derived from other potentially heterogeneous sites, without them sharing subject-level data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Specifically, under distributed data networks, our framework provides an interpretable tree-based ensemble of CATE estimators that joins models across study sites, while actively modeling the heterogeneity in data sources through site partitioning. The performance of this approach is demonstrated by a real-world study of the causal effects of oxygen therapy on hospital survival rate and backed up by comprehensive simulation results.
翻译:由于抽样规模有限,准确估计研究地点(如医院)内个人化治疗效果具有挑战性;此外,由于隐私考虑和缺乏资源,一个地点无法利用其他地点的专题一级数据;我们提出了一个基于树木的平均模型,通过利用来自其他潜在不同地点的模型,提高目标地点有条件平均治疗效果(CATE)的估计准确性,而不分享主题一级数据;根据我们的最佳知识,没有以改进治疗效果估计为重点的分配数据的既定平均模型;具体地说,在分布式数据网络下,我们的框架提供了可解释的基于树木的CATE估计数组合,这些估计数在跨研究地点加入模型,同时通过地点分割积极模拟数据源的异质性;关于氧治疗对医院存活率的因果关系的实际研究以及全面模拟结果都证明了这一方法的绩效。