For decades, best subset selection (BSS) has eluded statisticians mainly due to its computational bottleneck. However, until recently, modern computational breakthroughs have rekindled theoretical interest in BSS and have led to new findings. Recently, Guo et al. (2020) showed that the model selection performance of BSS is governed by a margin quantity that is robust to the design dependence, unlike modern methods such as LASSO, SCAD, MCP, etc. Motivated by their theoretical results, in this paper, we also study the variable selection properties of best subset selection for high-dimensional sparse linear regression setup. We show that apart from the identifiability margin, the following two complexity measures play a fundamental role in characterizing the margin condition for model consistency: (a) complexity of residualized features, (b) complexity of spurious projections. In particular, we establish a simple margin condition that only depends only on the identifiability margin quantity and the dominating one of the two complexity measures. Furthermore, we show that a similar margin condition depending on similar margin quantity and complexity measures is also necessary for model consistency of BSS. For a broader understanding of the complexity measures, we also consider some simple illustrative examples to demonstrate the variation in the complexity measures which broadens our theoretical understanding of the model selection performance of BSS under different correlation structures.
翻译:几十年来,最佳子集选择(BSS)主要由于计算瓶颈,使统计人员无法进行最佳子集选择(BSS),然而,直到最近,现代计算上的突破重新激发了对BSS的理论兴趣,并导致新的发现。最近,Guo等人(202020年)指出,BSS的模型选择性能受与设计依赖性相当的差值的制约,不同于LASSO、SCAD、MCP等现代方法。本文还研究了高维稀薄线性线性回归设置最佳子集选择的可变性。我们表明,除了识别性差值外,以下两种复杂度措施在确定模型一致性差值条件的特征方面起着根本作用:(a) 残余性特征的复杂性,(b) 虚假预测的复杂性。特别是,我们确立了一个简单的差值条件,仅取决于其可辨性差值比值数量和两种复杂措施之一。此外,我们表明,根据类似差值和复杂度计量措施的类似差值的差值条件,对于BSS的模型一致性也十分必要。为了BSS的示范性一致性,我们从更广义的角度理解了一种解释性结构结构。