We consider testing whether a set of Gaussian variables, selected from the data, is independent of the remaining variables. We assume that this set is selected via a very simple approach that is commonly used across scientific disciplines: we select a set of variables for which the correlation with all variables outside the set falls below some threshold. Unlike other settings in selective inference, failure to account for the selection step leads, in this setting, to excessively conservative (as opposed to anti-conservative) results. Our proposed test properly accounts for the fact that the set of variables is selected from the data, and thus is not overly conservative. To develop our test, we condition on the event that the selection resulted in the set of variables in question. To achieve computational tractability, we develop a new characterization of the conditioning event in terms of the canonical correlation between the groups of random variables. In simulation studies and in the analysis of gene co-expression networks, we show that our approach has much higher power than a ``naive'' approach that ignores the effect of selection.
翻译:我们考虑测试从数据中选择的一组高斯变量是否独立于剩余变量。 我们假设这组变量是通过一个非常简单的方法选择的,该方法在科学学科中通常使用:我们选择一组变量,其与本组之外所有变量的相关性低于某一阈值。不同于选择性推断中的其他设置,在这种环境下,未将选择步骤导致的结果计入过分保守(而不是反保守)的结果。我们提议的测试恰当地说明了从数据中选择了一组变量这一事实,因此不是过于保守。为了发展我们的测试,我们以选择导致一组变量的结果为条件。为了实现计算可移动性,我们根据随机变量组之间的可调控相关性,对调控事件作出新的定性。在模拟研究和基因共表情网络分析中,我们表明我们的方法比忽略选择效果的“自然”方法有更大的能力。