集群相竞风险数据和缺失故障原因的集群相竞风险数据 (Semiparametric Marginal Regression for Clustered Competing Risks Data with Missing Cause of Failure)

Clustered competing risks data are commonly encountered in multicenter studies. The analysis of such data is often complicated due to informative cluster size, a situation where the outcomes under study are associated with the size of the cluster. In addition, cause of failure is frequently incompletely observed in real-world settings. To the best of our knowledge, there is no methodology for population-averaged analysis with clustered competing risks data with informative cluster size and missing causes of failure. To address this problem, we consider the semiparametric marginal proportional cause-specific hazards model and propose a maximum partial pseudolikelihood estimator under a missing at random assumption. To make the latter assumption more plausible in practice, we allow for auxiliary variables that may be related to the probability of missingness. The proposed method does not impose assumptions regarding the within-cluster dependence and allows for informative cluster size. The asymptotic properties of the proposed estimators for both regression coefficients and infinite-dimensional parameters, such as the marginal cumulative incidence functions, are rigorously established. Simulation studies show that the proposed method performs well and that methods that ignore the within-cluster dependence and the informative cluster size lead to invalid inferences. The proposed method is applied to competing risks data from a large multicenter HIV study in sub-Saharan Africa where a significant portion of causes of failure is missing.

翻译：在多中心研究中,通常会遇到各种相互竞争的风险数据。这些数据的分析往往由于信息型群的规模而变得复杂,因为信息型群规模的大小,而正在研究的结果与集群的规模有关。此外,在现实世界环境中经常发现不完全的失败原因。据我们所知,没有方法用具有信息型群大小和缺失的失败原因的集群相竞争的风险数据进行人口平均分析。为了解决这一问题,我们考虑了半参数边半成比例因特定危害模型,并在随机缺失的假设下提出了一个最大部分假冒的估测器。为了使后者在实际中更加合理,我们允许可能与缺失概率有关的辅助变量。拟议方法不会对集群内依赖性作出假设,而允许信息型群规模的扩大。拟议的回归系数和无限维度参数(如边际累积率函数)的不确定性特性得到了严格确定。模拟研究表明,拟议的方法运行良好,而且忽略了集群内依赖性和信息型群规模的方法在实际中,我们允许采用辅助变量,这可能与缺失的可能性有关。拟议方法不会把集群内依赖性和信息型群积大小推算成多种无效。拟议方法导致撒哈拉以南非洲大量缺数据。