In oncological clinical trials, overall survival (OS) is the gold-standard endpoint, but long follow-up and treatment switching can delay or dilute detectable effects. Progression-free survival (PFS) often provides earlier evidence and is therefore frequently used together with OS as multiple primary endpoints. Since in certain scenarios trial success may be defined if one of the two hypotheses involved can be rejected, a correction for multiple testing may be deemed necessary. Because PFS and OS are generally highly dependent, their test statistics are typically correlated. Ignoring this dependency (e.g. via a simple Bonferroni correction) is not power optimal. We develop a group-sequential testing procedure for the multiple primary endpoints PFS and OS that fully exhausts the family-wise error rate (FWER) by exploiting their dependence. Specifically, we characterize the joint asymptotic distribution of log-rank statistics across endpoints and multiple event-driven analysis cutoffs. Furthermore, we show that we can consistently estimate the covariance structure. Embedding these results in a closed testing procedure, we can recalculate critical values of the test statistics in order to spend the available type I error optimally. An important extension to the current literature is that we allow for both interim and final analysis to be event-driven. Simulations based on illness-death multi-state models empirically confirm FWER control for moderate to large sample sizes. Compared with a simple Bonferroni correction, the proposed methods recover roughly two thirds of the power loss for OS, increase disjunctive and conjunctive power, and enable meaningful early stopping. In planning, these gains translate into about 5% fewer OS events required to reach the targeted power. We also discuss practical issues in the implementation of such designs and possible extensions of the introduced method.
翻译:在肿瘤学临床试验中,总生存期(OS)是金标准终点指标,但长期随访和治疗转换可能延迟或稀释可检测的效应。无进展生存期(PFS)通常能提供更早期的证据,因此常与OS共同作为多重主要终点。由于在某些情境下,试验成功可能定义为两个相关假设中任一可被拒绝,因此可能需要进行多重检验校正。鉴于PFS与OS通常高度相关,其检验统计量普遍存在相关性。忽略这种依赖性(例如通过简单Bonferroni校正)将导致检验效能非最优化。本研究针对多重主要终点PFS和OS开发了一种组序贯检验程序,通过利用其依赖性充分耗尽族系错误率(FWER)。具体而言,我们刻画了跨终点与多重事件驱动分析截点的对数秩统计量的联合渐近分布。进一步证明可一致估计其协方差结构。将这些结果嵌入闭合检验程序后,可重新计算检验统计量的临界值,从而最优分配可用的Ⅰ类错误。相较于现有文献的重要拓展在于:我们允许中期分析与最终分析均为事件驱动。基于疾病-死亡多状态模型的仿真实验经验证了中等到大样本量下的FWER控制。与简单Bonferroni校正相比,所提方法能恢复约三分之二的OS效能损失,提升析取与合取效能,并实现有意义的早期终止。在试验规划中,这些优势可转化为达到目标效能所需OS事件数减少约5%。本文还讨论了此类设计实施中的实际问题及所提方法的可能扩展方向。