The Cox model, which remains as the first choice in analyzing time-to-event data even for large datasets, relies on the proportional hazards (PH) assumption. When survival data arrive sequentially in chunks, a fast and minimally storage intensive approach to test the PH assumption is desirable. We propose an online updating approach that updates the standard test statistic as each new block of data becomes available, and greatly lightens the computational burden. Under the null hypothesis of PH, the proposed statistic is shown to have the same asymptotic distribution as the standard version computed on the entire data stream with the data blocks pooled into one dataset. In simulation studies, the test and its variant based on most recent data blocks maintain their sizes when the PH assumption holds and have substantial power to detect different violations of the PH assumption. We also show in simulation that our approach can be used successfully with "big data" that exceed a single computer's computational resources. The approach is illustrated with the survival analysis of patients with lymphoma cancer from the Surveillance, Epidemiology, and End Results Program. The proposed test promptly identified deviation from the PH assumption that was not captured by the test based on the entire data.
翻译:Cox 模型仍然是分析时间到活动数据的第一选择,甚至对于大型数据集来说,它仍然是分析时间到活动数据的第一选择,它依赖于比例危害的假设。当生存数据按顺序以块的形式到达时,可取的是快速和最低限度的储存密集方法来测试PH假设。我们提议了一种在线更新方法,随着每个新的数据块的出现更新标准测试统计数据,并大大减轻计算负担。在PH的无效假设下,拟议的统计显示其分布与计算整个数据流的标准版本相同,该数据块汇集到一个数据集中。在模拟研究中,基于最新数据块的测试及其变体在PH假设持有并具有重大能力来检测不同违反PH假设的情况时保持其大小。我们还在模拟中表明,我们的方法可以成功地使用超过计算机计算资源的“大数据”。该方法通过对来自监测、流行病学和结束结果方案的淋巴癌患者进行的生存分析加以说明。拟议的测试迅速确定了与非通过整个数据采集的PH假设的PH假设的偏差。