Basis Pursuit (BP), Basis Pursuit DeNoising (BPDN), and LASSO are popular methods for identifying important predictors in the high-dimensional linear regression model, i.e. when the number of rows of the design matrix X is smaller than the number of columns. By definition, BP uniquely recovers the vector of regression coefficients b if there is no noise and the vector b has the smallest L1 norm among all vectors s such that Xb=Xs (identifiability condition). Furthermore, LASSO can recover the sign of b only under a much stronger irrepresentability condition. Meanwhile, it is known that the model selection properties of LASSO can be improved by hard-thresholding its estimates. This article supports these findings by proving that thresholded LASSO, thresholded BPDN and thresholded BP recover the sign of b in both the noisy and noiseless cases if and only if b is identifiable and large enough. In particular, if X has iid Gaussian entries and the number of predictors grows linearly with the sample size, then these thresholded estimators can recover the sign of b when the signal sparsity is asymptotically below the Donoho-Tanner transition curve. This is in contrast to the regular LASSO, which asymptotically recovers the sign of b only when the signal sparsity tends to 0. Numerical experiments show that the identifiability condition, unlike the irrepresentability condition, does not seem to be affected by the structure of the correlations in the $X$ matrix.
翻译:根据定义,如果没有噪音,BP单能恢复回归系数b的矢量矢量,b矢量b在所有矢量中具有最小的L1标准,例如Xb=X(可识别性条件),LASO只有在无法显示的情况下才能恢复b的标志。此外,LASO在高度线性回归模型中,也就是当设计矩阵X的行数小于列数时,即当设计矩阵XX的行数小于列数时,确定BB的重要预测值的常用方法是流行的。同时,众所周知,如果设计矩阵XSO的行数小于列线性线性回归模型选择值,即当设计矩阵XSO的行数小数小于线性线性回归时,LASSO的模型选择值属性可以通过硬性保存其估计值来改进。根据定义,这篇文章支持这些结果,通过证明LASSO的门槛值、门槛BPN和门槛性 BP,只有在b可识别性和无噪音的情况下,如果X值条目的条目和预测值的矩阵的数值值与样本大小相当,则只能通过直线性递增。