Studies to compare the survival of two or more groups using time-to-event data are of high importance in medical research. The gold standard is the log-rank test, which is optimal under proportional hazards. As the latter is no simple regularity assumption, we are interested in evaluating the power of various statistical tests under different settings including proportional and non-proportional hazards with a special emphasize on crossing hazards. This challenge has been going on for many years now and multiple methods have already been investigated in extensive simulation studies. However, in recent years new omnibus tests and methods based on the restricted mean survival time appeared that have been strongly recommended in biometric literature. Thus, to give updated recommendations, we perform a vast simulation study to compare tests that showed high power in previous studies with these more recent approaches. We thereby analyze various simulation settings with varying survival and censoring distributions, unequal censoring between groups, small sample sizes and unbalanced group sizes. Overall, omnibus tests are more robust in terms of power against deviations from the proportional hazards assumption.
翻译:利用时间到活动数据比较两个或两个以上群体生存情况的研究在医学研究中非常重要。金本位标准是日志测试,在比例危害下是最佳的。由于后者不是简单的常规假设,我们有兴趣评估不同情况下各种统计测试的力量,包括比例和非比例危害,并特别强调越境危害。这项挑战已经持续了多年,在广泛的模拟研究中已经对多种方法进行了调查。然而,近年来,基于有限平均生存时间的新的综合测试和方法似乎在生物物理文献中强烈建议。因此,为了提供最新建议,我们进行了大规模模拟研究,将以往研究显示高功率的测试与这些较新的方法进行比较。因此,我们分析了不同生存和检查分布的各种模拟环境,不同群体之间的审查、小样本大小和不平衡群体大小。总体而言,综合测试在防止偏离比例危害假设的能力方面更为有力。