It is known that the Thresholded Lasso (TL), SCAD or MCP correct intrinsic estimation bias of the Lasso. In this paper we propose an alternative method of improving the Lasso for predictive models with general convex loss functions which encompass normal linear models, logistic regression, quantile regression or support vector machines. For a given penalty we order the absolute values of the Lasso non-zero coefficients and then select the final model from a small nested family by the Generalized Information Criterion. We derive exponential upper bounds on the selection error of the method. These results confirm that, at least for normal linear models, our algorithm seems to be the benchmark for the theory of model selection as it is constructive, computationally efficient and leads to consistent model selection under weak assumptions. Constructivity of the algorithm means that, in contrast to the TL, SCAD or MCP, consistent selection does not rely on the unknown parameters as the cone invertibility factor. Instead, our algorithm only needs the sample size, the number of predictors and an upper bound on the noise parameter. We show in numerical experiments on synthetic and real-world data sets that an implementation of our algorithm is more accurate than implementations of studied concave regularizations. Our procedure is contained in the R package "DMRnet" and available on the CRAN repository.
翻译:众所周知, Lasso(TL)、 SCAD 或 MCP 的临界值是Lasso (TL)、 SCAD 或 MCP 的内在估计偏差。 在本文中,我们建议了另一种方法来改进Lasso 用于预测模型的Lasso 的方法,该模型具有一般的 convex 损失功能,包括正常线性模型、后勤回归、四分回归或支持矢量机。 对于给定的处罚,我们用通用信息标准从一个小巢状的大家庭中订购Lasso非零系数的绝对值,然后从中选择最后的模型。我们从方法的选择错误中得出指数性的上限值。这些结果证实,至少对于正常的线性模型来说,我们的算法似乎是模型选择理论的基准,因为它具有建设性、计算效率,并且导致在薄弱的假设下得出一致的模型选择模式。 算法的构造意味着,与TL、 SCAD 或 MCP 的绝对值相比, 一致选择并不依赖于未知的参数。相反,我们的算法仅需要样本大小、 预测器和噪音参数的上限。我们在合成和真实的 R-D CD 程序上,我们研究的CRMRMAR 的配置中的计算方法的精确化程序。