It is common to show the confidence intervals or $p$-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely-applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective $p$-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the $p$-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.
翻译:在回归中显示选定特征或预测变量的置信期或美元价值是常见的,但往往涉及选择偏差。选择性推断方法通过对选择事件进行限制来解决这种偏差。大多数现有的选择性推断研究都考虑特定的算法,如Lasso,用于选择特征,因此难以处理更复杂的算法。此外,现有研究往往考虑到不必要的限制性事件,导致过度调节和较低的统计能力。我们通过多尺度靴子陷阱进行的新颖和广泛适用的抽取方法解决这些问题,以计算出对选定特征的近乎公正的选择性的美元价值。作为拟议方法的简化,我们还通过古典靴子陷阱开发了一种更简单的方法。我们证明,我们多尺度靴子方法计算出来的美元价值比古典靴子捕捉法更准确。此外,数字实验表明,我们的算法甚至对更复杂的特征选择方法,如非凝固的正规化方法,效果也很好。