Motivated by multi-center biomedical studies that cannot share individual data due to privacy and ownership concerns, we develop communication-efficient iterative distributed algorithms for estimation and inference in the high-dimensional sparse Cox proportional hazards model. We demonstrate that our estimator, even with a relatively small number of iterations, achieves the same convergence rate as the ideal full-sample estimator under very mild conditions. To construct confidence intervals for linear combinations of high-dimensional hazard regression coefficients, we introduce a novel debiased method, establish central limit theorems, and provide consistent variance estimators that yield asymptotically valid distributed confidence intervals. In addition, we provide valid and powerful distributed hypothesis tests for any coordinate element based on a decorrelated score test. We allow time-dependent covariates as well as censored survival times. Extensive numerical experiments on both simulated and real data lend further support to our theory and demonstrate that our communication-efficient distributed estimators, confidence intervals, and hypothesis tests improve upon alternative methods.
翻译:在多中心生物医学研究的推动下,由于对隐私和所有权的关切,无法分享个人数据,我们开发了通信效率高的迭代分布式算法,用于在高维分散式考克斯比例危害模型中进行估计和推断。我们证明,我们的估算器,即使使用相对较少的迭代,也实现了与理想的全成分布式估计器相同的趋同率,条件非常温和。为了为高维危险回归系数的线性组合构建信任度间隔,我们采用了新颖的脱差法,设定了中央限值,并提供了一致的差异估计值,得出了不同时有效的分布式信任间隔。此外,我们还根据一个与雕塑有关的评分测试,为任何协调要素提供了有效和有力的分布式假设测试。我们允许基于时间的共变数以及受审查的生存时间。关于模拟数据和实际数据的广泛数字实验进一步支持了我们的理论,并证明我们的通信效率分布估计器、信任度间隔和假设测试对替代方法的改进了。