Residual diagnostic methods play a critical role in assessing model assumptions and detecting outliers in statistical modelling. In the context of survival models with censored observations, Li et al. (2021) introduced the Z-residual, which follows an approximately normal distribution under the true model. This property makes it possible to use Z-residuals for diagnosing survival models in a way similar to how Pearson residuals are used in normal regression. However, computing residuals based on the full dataset can result in a conservative bias that reduces the power of detecting model mis-specification, as the same dataset is used for both model fitting and validation. Although cross-validation is a potential solution to this problem, it has not been commonly used in residual diagnostics due to computational challenges. In this paper, we propose a cross-validation approach for computing Z-residuals in the context of shared frailty models. Specifically, we develop a general function that calculates cross-validatory Z-residuals using the output from the \texttt{coxph} function in the \texttt{survival} package in R.Our simulation studies demonstrate that, for goodness-of-fit tests and outlier detection, cross-validatory Z-residuals are significantly more powerful and more discriminative than Z-residuals without cross-validation. We also compare the performance of Z-residuals with and without cross-validation in identifying outliers in a real application that models the recurrence time of kidney infection patients. Our findings suggest that cross-validatory Z-residuals can identify outliers that are missed by Z-residuals without cross-validation.
翻译:----
残差诊断方法在统计建模中评估模型假设和检测离群值方面起着关键作用。在具有被截断观察的生存模型中,Li等人(2021)介绍了Z-残差,该残差在真实模型下遵循近似正态分布。这个特性使得可以像在正常回归中使用Pearson残差一样使用Z-残差来诊断生存模型。然而,基于完整数据集计算残差可能会导致保守偏差,从而降低检测模型错误的能力,因为同一数据集用于模型拟合和验证。虽然交叉验证是解决此问题的潜在解决方案,但由于计算挑战,它在残差诊断中并不常见。在本文中,我们在共享的劣势模型的背景下提出了一种交叉验证方法计算Z-残差。具体而言,我们开发了一个通用函数,使用R中survival包中coxph函数的输出来计算交叉验证Z-残差。我们的模拟研究证明,对于拟合优度检验和异常值检测,交叉验证Z-残差比没有交叉验证的Z-残差显著更具有功效和区分能力。我们还比较了对于肾感染患者复发时间建模中使用Z-残差+交叉验证和单独使用Z-残差在识别异常值方面的性能。我们的发现表明,交叉验证Z-残差可以识别被Z-残差未能识别的异常值。