Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this semi-supervised framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a multiple blockwise imputation procedure, and obtain its rates of convergence. Furthermore, building upon an innovative semi-supervised projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose nearly unbiased estimators for the individual regression coefficients that are asymptotically normally distributed under mild conditions. By carefully analyzing these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
翻译:当我们整合多源数据或多模式数据,不同来源或模式包含补充信息时,断层缺失数据缺失数据会经常出现。在本文中,我们考虑一个高维线性回归模型,其中含有块状缺失共变量和一个部分观察到的响应变量。在这个半监督框架内,我们根据精心构建的公正估算方程式和多块分割估算程序,为回归系数矢量提出一个计算高效的估算值,并获得其趋同率。此外,在创新的半监督的预测估计方程技术基础上,我们建议对个别回归系数进行近乎公正的估计,这些参数通常在轻度条件下以静态方式分布。我们仔细分析这些偏差的估算值,对每种回归系数进行非随机有效的信任间隔和统计测试,从而构建出一个量化研究和应用分析,对阿尔茨海默氏病神经成像倡议的数据进行分析,表明拟议方法比现有方法更佳、更受益于非超固采样。