确定早期选择最佳子集时 (On sure early selection of the best subset)

The early solution path, which tracks the first few variables that enter the model of a selection procedure, is of profound importance to scientific discoveries. In practice, it is often statistically hopeless to identify all the important features with no false discovery, let alone the intimidating expense of experiments to test their significance. Such realistic limitation calls for statistical guarantee for the early discoveries of a model selector. In this paper, we focus on the early solution path of best subset selection (BSS), where the sparsity constraint is set to be lower than {the true sparsity}. Under a sparse high-dimensional linear model, we establish the sufficient and (near) necessary condition for BSS to achieve sure early selection, or equivalently, zero false discovery throughout its early path. Essentially, this condition boils down to a lower bound of the minimum projected signal margin that characterizes the gap of the captured signal strength between sure selection models and those with spurious discoveries. Defined through projection operators, this margin is independent of the restricted eigenvalues of the design, suggesting the robustness of BSS against collinearity. Moreover, our model selection guarantee tolerates reasonable optimization error and thus applies to near best subsets. Finally, to overcome the computational hurdle of BSS under high dimension, we propose the "screen then select" (STS) strategy to reduce dimension for BSS. Our numerical experiments show that the resulting early path exhibits much lower false discovery rate (FDR) than LASSO, MCP and SCAD, especially in the presence of highly correlated design. We also investigate the early paths of the iterative hard thresholding algorithms, which are greedy computational surrogates for BSS, and which yield comparable FDR as our STS procedure.

翻译：早期解决方案路径跟踪进入选择程序模式的最初几个变量,对于科学发现具有深远的重要意义。在实践中,在统计上往往毫无希望,无法在没有虚假发现的情况下确定所有重要特征,更不要说测试其重要性的实验的恐吓成本。这种现实限制要求为模型选择器的早期发现提供统计保障。在本文中,我们侧重于最佳子集选择(BSS)的早期解决方案路径,其中的紧张性制约定得低于{真正的粒子选择}。在一个稀疏的高维线性线性模型下,我们为BSS建立了足够和(近于)必要的条件,以确保早期选择,或在整个早期路径中实现相等的零虚假发现。基本上,这一条件将降低到最低预测信号差的界限,以显示所捕捉到的信号强度在肯定选择模型和那些有尖锐的发现。在投影操作中,这种差幅独立于设计中有限的igen值,表明BSS的稳健性与直线性。此外,我们模型选择的直径直径直线性设计,特别保证在最早期的SA值中,从而显示我们最接近最接近的递缩的S的S的SL的SL,最终的递值,从而显示我们最接近的SL的SA值。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日