Conditional selective inference (SI) has been studied intensively as a new statistical inference framework for data-driven hypotheses. The basic concept of conditional SI is to make the inference conditional on the selection event, which enables an exact and valid statistical inference to be conducted even when the hypothesis is selected based on the data. Conditional SI has mainly been studied in the context of model selection, such as vanilla lasso or generalized lasso. The main limitation of existing approaches is the low statistical power owing to over-conditioning, which is required for computational tractability. In this study, we propose a more powerful and general conditional SI method for a class of problems that can be converted into quadratic parametric programming, which includes generalized lasso. The key concept is to compute the continuum path of the optimal solution in the direction of the selected test statistic and to identify the subset of the data space that corresponds to the model selection event by following the solution path. The proposed parametric programming-based method not only avoids the aforementioned major drawback of over-conditioning, but also improves the performance and practicality of SI in various respects. We conducted several experiments to demonstrate the effectiveness and efficiency of our proposed method.
翻译:作为数据驱动假设的新的统计推断框架,对有条件的选择性推断(SI)进行了深入的研究,作为数据驱动假设的新统计推断框架。有条件的SI的基本概念是使推断以选择活动为条件,这样即使在根据数据选择假设时,也能够进行准确有效的统计推断。有条件的SI主要在模式选择方面进行了研究,例如香草拉索或普遍拉索。现有方法的主要局限性是过度调控导致的统计能力低,这是计算可感性所需要的。在本研究中,我们建议了一种更强大和一般的有条件的SIS方法,用以处理可以转换为四边对准程序(包括通用的 lasso ) 的一类问题。关键概念是将最佳解决办法的连续路径放在选定的测试统计方向上,并查明与示范选择事件相对应的数据空间的子集。拟议的基于方案拟定方法不仅避免了上述的过度调制的主要倒退,而且还改进了所拟议的SI方法的性能和实用性。我们从不同方面向我们展示了拟议方法的性能和实用性。