StarTrek: 带假发现率控制的组合变量选择 (StarTrek: Combinatorial Variable Selection with False Discovery Rate Control)

Variable selection on the large-scale networks has been extensively studied in the literature. While most of the existing methods are limited to the local functionals especially the graph edges, this paper focuses on selecting the discrete hub structures of the networks. Specifically, we propose an inferential method, called StarTrek filter, to select the hub nodes with degrees larger than a certain thresholding level in the high dimensional graphical models and control the false discovery rate (FDR). Discovering hub nodes in the networks is challenging: there is no straightforward statistic for testing the degree of a node due to the combinatorial structures; complicated dependence in the multiple testing problem is hard to characterize and control. In methodology, the StarTrek filter overcomes this by constructing p-values based on the maximum test statistics via the Gaussian multiplier bootstrap. In theory, we show that the StarTrek filter can control the FDR by providing accurate bounds on the approximation errors of the quantile estimation and addressing the dependence structures among the maximal statistics. To this end, we establish novel Cram\'er-type comparison bounds for the high dimensional Gaussian random vectors. Comparing to the Gaussian comparison bound via the Kolmogorov distance established by \citet{chernozhukov2014anti}, our Cram\'er-type comparison bounds establish the relative difference between the distribution functions of two high dimensional Gaussian random vectors. We illustrate the validity of the StarTrek filter in a series of numerical experiments and apply it to the genotype-tissue expression dataset to discover central regulator genes.

翻译：文献中广泛研究了大型网络的变量选择。虽然大多数现有方法都局限于本地功能, 特别是图形边缘, 但本文侧重于选择网络的离散枢纽结构。具体地说, 我们提议一种推断方法, 叫做 StarTrek 过滤器, 以选择高维图形模型中高于一定阈值的枢纽节点, 并控制错误发现率。在网络中发现中枢节点具有挑战性: 由于没有组合结构, 没有直接的统计来测试节点的程度; 多重测试问题的复杂依赖性很难定性和控制。在方法中, StarTrek 过滤器可以根据最高测试统计数据构建 p- 值, 称为StarTrek 过滤器, 在高维图形模型模型模型中, 显示StarTrek 过滤器可以控制 FDR, 提供精确的缩略误, 并解决最高值统计数据中的依赖性结构。至此, 我们为此建立新型的 Cramramp- typecial- disal- contracations the Cralalalder comstrationalalalalalal exaltraction 。我们通过高ormatialalals 的Crevations dalbs 。