For statistical inference on regression models with a diverging number of covariates, the existing literature typically makes sparsity assumptions on the inverse of the Fisher information matrix. Such assumptions, however, are often violated under Cox proportion hazards models, leading to biased estimates with under-coverage confidence intervals. We propose a modified debiased lasso approach, which solves a series of quadratic programming problems to approximate the inverse information matrix without posing sparse matrix assumptions. We establish asymptotic results for the estimated regression coefficients when the dimension of covariates diverges with the sample size. As demonstrated by extensive simulations, our proposed method provides consistent estimates and confidence intervals with nominal coverage probabilities. The utility of the method is further demonstrated by assessing the effects of genetic markers on patients' overall survival with the Boston Lung Cancer Survival Cohort, a large-scale epidemiology study investigating mechanisms underlying the lung cancer.
翻译:关于回归模型的统计推论,有不同数目的共变数,现有文献通常对渔业信息矩阵的反面作出宽度假设,然而,这些假设往往在Cox比例危害模型中被违反,导致对隐蔽信任期过低的偏差估计。我们建议采用经修改的偏差拉索方法,解决一系列四边形编程问题,以接近反向信息矩阵,同时又不构成稀少的矩阵假设。当共变规模与抽样规模不同时,我们为估计的回归系数设定了零点结果。正如广泛的模拟所显示的那样,我们提议的方法提供了一致的估计和信任间隔,并提供了名义覆盖概率。通过评估遗传标志对患者总体生存的影响,进一步证明了这种方法的效用,而波士顿肺癌生存组织是调查肺癌基本机制的大规模流行病学研究。