The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with time-dependent covariates when the sample is extraordinarily large but computing resources are relatively limited. A subsample estimator is developed by maximizing the weighted partial likelihood; it is shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expressions. Simulation studies show that the proposed method can satisfactorily approximate the estimator of the full dataset. The proposed method is then applied to corporate loan and breast cancer datasets, with different censoring rates, and the outcomes confirm its practical advantages.
翻译:大规模生存数据的使用在生存分析中已变得司空见惯。在本研究中,为考克斯比例危害模型建议了一个子抽样算法,在样本非常大但计算资源相对有限的情况下,采用时间性共变法,当样本为超大但计算资源相对有限时,采用时间性共变法。一个子抽样算法是通过加权部分可能性最大化来开发的;这证明具有一致性和无症状的正常性。通过将子抽样估测器的无症状平均正方形错误降到最低,以明确的表达方式制定了最佳的子抽样比对概率。模拟研究表明,拟议的方法可以令人满意地接近全数据集的估测值。然后,将拟议方法应用于公司贷款和乳腺癌数据集,采用不同的审查率,结果证实了其实际优势。