In this paper, we propose a novel variable selection approach in the framework of high-dimensional linear models where the columns of the design matrix are highly correlated. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the columns of the design matrix and in applying a generalized Elastic Net criterion since it can be seen as an extension of the generalized Lasso. The properties of our approach called gEN (generalized Elastic Net) are investigated both from a theoretical and a numerical point of view. More precisely, we provide a new condition called GIC (Generalized Irrepresentable Condition) which generalizes the EIC (Elastic Net Irrepresentable Condition) of Jia and Yu (2010) under which we prove that our estimator can recover the positions of the null and non null entries of the coefficients when the sample size tends to infinity. We also assess the performance of our methodology using synthetic data and compare it with alternative approaches. Our numerical experiments show that our approach improves the variable selection performance in many cases.
翻译:在本文中,我们提议在设计矩阵各列高度关联的高维线性模型框架内采用新的变量选择方法,包括重写最初的高维线性模型,以删除设计矩阵各列之间的相关性,并采用通用的 Elastic Net 标准,因为它可以被视为通用激光网的延伸。我们称为GEN (通用 Elastic Net) 的方法的特性是从理论和数字角度来调查的。更确切地说,我们提供了一个新的条件,称为GIC(通用的不可见状态),它概括了Jia和Yu(2010年)的 EIC(电子网络可显示状态),根据这个条件,我们证明我们的天花可以在样本大小趋向无限时恢复系数的空和非空条目位置。我们还从理论角度和数字角度评估了我们方法的绩效,并用合成数据将其与替代方法进行比较。我们的数字实验表明,我们的方法在许多情况下提高了变量选择性。