The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into prevention strategies or treatment decisions for both patients and physicians. High dimensional inference, including confidence intervals and hypothesis testing, has sparked much interest. While much work has been done in the linear regression setting, there is lack of literature on inference for high dimensional generalized linear models. We propose a novel and computationally feasible method, which accommodates a variety of outcome types, including normal, binomial, and Poisson data. We use a "splitting and smoothing" approach, which splits samples into two parts, performs variable selection using one part and conducts partial regression with the other part. Averaging the estimates over multiple random splits, we obtain the smoothed estimates, which are numerically stable. We show that the estimates are consistent, asymptotically normal, and construct confidence intervals with proper coverage probabilities for all predictors. We examine the finite sample performance of our method by comparing it with the existing methods and applying it to analyze a lung cancer cohort study.
翻译:现代生物医学研究的重点已逐渐转向解释和估计高维预测器对疾病风险的共同影响。这些估计中的不确定性可能为病人和医生的预防战略或治疗决定提供宝贵的洞察力。高维推理,包括信心间隔和假设测试,引起了很大的兴趣。虽然在线性回归环境方面做了大量工作,但缺乏关于高维通用直线模型推论的文献。我们提出了一个新颖和在计算上可行的方法,其中考虑到各种结果类型,包括正常、二元和 Poisson 数据。我们采用“分解和平滑”方法,将样本分成两个部分,使用一个部分进行变量选择,并与另一部分进行部分部分部分回归。对多位随机分割作出估算,我们获得的是平滑的估算,这些估算数字稳定。我们表明,这些估算是符合正常的,并且根据所有预测器的适当覆盖概率来构建信任间隔。我们通过将其与现有方法进行比较并应用其分析肺癌组研究来检查我们方法的有限样本性表现。