Traditional variable selection methods could fail to be sign consistent when irrepresentable conditions are violated. This is especially critical in high-dimensional settings when the number of predictors exceeds the sample size. In this paper, we propose a new semi-standard partial covariance (SPAC) approach which is capable of reducing correlation effects from other covariates while fully capturing the magnitude of coefficients. The proposed SPAC is effective in choosing covariates which have direct effects on the response variable, while eliminating the predictors which are not directly associated with the response but are highly correlated with the relevant predictors. We show that the proposed SPAC method with the Lasso penalty or the smoothly clipped absolute deviation (SCAD) penalty possesses strong sign consistency in high-dimensional settings. Numerical studies and a post-traumatic stress disorder data application also confirm that the proposed method outperforms the existing Lasso, adaptive Lasso, SCAD, Peter-Clark-simple algorithm, and factor-adjusted regularized model selection methods when the irrepresentable conditions fail.
翻译:传统的可变选择方法可能无法在无法证明的条件被违反时标记一致。 当预测器的数量超过样本大小时,这在高维环境中尤为重要。 在本文中,我们建议采用新的半标准部分共变法(SPAC),这种方法能够减少其他共变法的关联效应,同时充分捕捉系数的大小。 拟议的SPAC有效地选择了对响应变量有直接影响的共变法,同时消除了与响应没有直接关联但与有关预测器密切相关的预测器。 我们表明,拟议的SPAC方法与激光索处罚或顺利剪切绝对偏差(SCAD)处罚具有很强的标志一致性。 数字研究和创伤后应激障碍数据应用还证实,拟议的方法在无法显示的情况失败时,超过了现有的Lasso、适应Lasso、SCAD、Peter-Clark简单算法以及按系数调整的正常模式选择方法。