We study a seemingly unexpected and relatively less understood overfitting aspect of a fundamental tool in sparse linear modeling - best subset selection, which minimizes the residual sum of squares subject to a constraint on the number of nonzero coefficients. While the best subset selection procedure is often perceived as the "gold standard" in sparse learning when the signal to noise ratio (SNR) is high, its predictive performance deteriorates when the SNR is low. In particular, it is outperformed by continuous shrinkage methods, such as ridge regression and the Lasso. We investigate the behavior of best subset selection in the high-noise regimes and propose an alternative approach based on a regularized version of the least-squares criterion. Our proposed estimators (a) mitigate, to a large extent, the poor predictive performance of best subset selection in the high-noise regimes; and (b) perform favorably, while generally delivering substantially sparser models, relative to the best predictive models available via ridge regression and the Lasso. We conduct an extensive theoretical analysis of the predictive properties of the proposed approach and provide justification for its superior predictive performance relative to best subset selection when the noise-level is high. Our estimators can be expressed as solutions to mixed integer second order conic optimization problems and, hence, are amenable to modern computational tools from mathematical optimization.
翻译:我们研究一个似乎出乎意料和相对理解较少的偏差基本工具在稀疏线性建模(最佳子集选择)方面似乎出乎意料,而且相对而言理解较少,这一基本工具在细线性建模(最佳子集选择)方面过于适合,在非零系数数量的限制下,将方方块的剩余总和最小值最小值最小值最小值最小值,而最佳子集选择程序在噪音比信号高时往往被视为稀疏学习中的“黄金标准 ”, 其预测性能在SNR低时则会恶化, 特别是,它由于脊脊回归和激光索等连续的缩缩缩缩方法,而表现得远远超出。 我们调查了高噪音制度中最佳子集精度选择的行为,并提出了一种基于固定化的最差值最低值标准的替代方法。我们提议的子集选择者(a) 在很大程度上减轻了在高音比子集度选择中的最佳子集值的预测性能; 相对于通过峰值回归和Lasso等方法提供最差的模型。 我们对拟议方法的预测性化特性进行广泛的理论分析,并为其高级预测性优化性压性压的精确性计算提供了理由。