Statistical inference on the explained variation of an outcome by a set of covariates is of particular interest in practice. When the covariates are of moderate to high-dimension and the effects are not sparse, several approaches have been proposed for estimation and inference. One major problem with the existing approaches is that the inference procedures are not robust to the normality assumption on the covariates and the residual errors. In this paper, we propose an estimating equation approach to the estimation and inference on the explained variation in the high-dimensional linear model. Unlike the existing approaches, the proposed approach does not rely on the restrictive normality assumptions for inference. It is shown that the proposed estimator is consistent and asymptotically normally distributed under reasonable conditions. Simulation studies demonstrate better performance of the proposed inference procedure in comparison with the existing approaches. The proposed approach is applied to studying the variation of glycohemoglobin explained by environmental pollutants in a National Health and Nutrition Examination Survey data set.
翻译:关于一组共变体解释的结果差异的统计推论在实践上特别有意义。当共变体为中度至高度差异,其影响并不稀少时,提出了若干方法进行估计和推断。现有方法的一个主要问题是,推论程序对于共变体和剩余误差的正常假设并不健全。在本文中,我们提议对高维线性模型解释差异的估计和推论采用估计方程法。与现有方法不同,拟议方法并不依靠限制性的正常假设来推断。据证明,拟议的估测器在合理条件下通常分布是一致的,并且是偶然的。模拟研究表明,与现有方法相比,拟议的推论程序表现较好。拟议方法用于研究国家健康和营养调查数据集中环境污染物解释的球蛋白色素变化。