Motivated by multi-center biomedical studies that cannot share individual data due to privacy and ownership concerns, we develop communication-efficient iterative distributed algorithms for estimation and inference in the high-dimensional sparse Cox proportional hazards model. We demonstrate that our estimator, even with a relatively small number of iterations, achieves the same convergence rate as the ideal full-sample estimator under very mild conditions. To construct confidence intervals for linear combinations of high-dimensional hazard regression coefficients, we introduce a novel debiased method, establish central limit theorems, and provide consistent variance estimators that yield asymptotically valid distributed confidence intervals. In addition, we provide valid and powerful distributed hypothesis tests for any coordinate element based on a decorrelated score test. We allow time-dependent covariates as well as censored survival times. Extensive numerical experiments on both simulated and real data lend further support to our theory and demonstrate that our communication-efficient distributed estimators, confidence intervals, and hypothesis tests improve upon alternative methods.
翻译:在多中心生物医学研究中,由于隐私和权利问题,无法共享个体数据,因此我们开发了通信高效的分布式算法,在高维稀疏的Cox比例风险模型中进行估计和推理。我们证明,即使进行较少次数的迭代,我们的估计器也在非常温和的条件下实现了与理想全样本估计器相同的收敛速度。为了构建高维风险回归系数的线性组合的置信区间,我们引入了一种新颖的去偏方法,建立了中心极限定理,并提供了一致的方差估计器,得到了渐进有效的分布置信区间。此外,我们提供了可靠而强大的分布式假设检验,用于任何坐标元素,基于一个装饰的得分检验。我们允许时间相关的协变量以及被审查的生存时间。广泛的仿真和实际数据实验进一步支持了我们的理论,并证明我们的通信高效的分布式估计器、置信区间和假设检验改进了替代方法。