We study the problem of identifying the set of \emph{active} variables, termed in the literature as \emph{variable selection} or \emph{multiple hypothesis testing}, depending on the pursued criteria. For a general \emph{distribution-free} setting of non-normal, possibly dependent observations and a generalized notion of \emph{active set}, we propose a procedure that is used simultaneously for the both tasks, variable selection and multiple testing. The procedure is based on the \emph{risk hull minimization} method, but can also be obtained as a result of an empirical Bayes approach or a penalization strategy. We address its quality via various criteria: the Hamming risk, FDR, FPR, FWER, NDR, FNR, and various \emph{multiple testing risks}, e.g., MTR=FDR+NDR, and exhibit the peculiar \emph{phase transition} phenomenon. Finally, we introduce and study, for the first time, the \emph{uncertainty quantification} problem in the variable selection and multiple testing context.
翻译:我们根据所追求的标准,研究如何确定文献中称为 emph{ 可变选择} 或 emph{ 多重假设测试} 的一组变量的问题。 对于非正常、可能依赖性观测的一般设置和通用概念 \emph{ 活性集,我们提议一个同时用于两个任务、变量选择和多重测试的程序。该程序基于\ emph{ 风险船体最小化} 方法,但也可以通过经验性海湾方法或惩罚性战略获得。我们通过各种标准来处理其质量: 模拟风险、 FDR、 FPR、 FWER、 NDR、 FNR 和各种 emph{ 多重测试风险,例如中期审查= FDR+NDR, 并展示特殊的/emph{ 阶段过渡} 现象。 最后,我们首次在变量选择和多重测试背景下介绍和研究 。