We consider random sample splitting for estimation and inference in high dimensional generalized linear models, where we first apply the lasso to select a submodel using one subsample and then apply the debiased lasso to fit the selected model using the remaining subsample. We show that, no matter including a prespecified subset of regression coefficients or not, the debiased lasso estimation of the selected submodel after a single splitting follows a normal distribution asymptotically. Furthermore, for a set of prespecified regression coefficients, we show that a multiple splitting procedure based on the debiased lasso can address the loss of efficiency associated with sample splitting and produce asymptotically normal estimates under mild conditions. Our simulation results indicate that using the debiased lasso instead of the standard maximum likelihood estimator in the estimation stage can vastly reduce the bias and variance of the resulting estimates. We illustrate the proposed multiple splitting debiased lasso method with an analysis of the smoking data of the Mid-South Tobacco Case-Control Study.
翻译:我们认为,在高维通用线性模型中,随机抽样对估计和推断进行分解,我们首先应用拉索来选择一个子模型,先用一个子抽样,然后用偏差的拉索来适应所选模型,再用其余的子抽样。我们表明,无论是否包括一个预先确定的回归系数子集,单分后选定子模型的脱差拉索估计都以正常的瞬间分布方式进行。此外,对于一套预先确定的回归系数,我们表明,基于脱差的拉索的多重分解程序可以解决在温和条件下与分解样本相关的效率损失问题,并得出无症状的正常估计数。我们的模拟结果表明,在估计阶段使用偏差的拉索而不是标准的最大概率估计器可以大大降低所得出的估计数的偏差和差异。我们通过分析中南烟草案例研究的吸烟数据来说明拟议的多分差脱差拉索法。</s>