We propose a new method to learn the structure of a Gaussian graphical model with finite sample false discovery rate control. Our method builds on the knockoff framework of Barber and Cand\`{e}s for linear models. We extend their approach to the graphical model setting by using a local (node-based) and a global (graph-based) step: we construct knockoffs and feature statistics for each node locally, and then solve a global optimization problem to determine a threshold for each node. We then estimate the neighborhood of each node, by comparing its feature statistics to its threshold, resulting in our graph estimate. Our proposed method is very flexible, in the sense that there is freedom in the choice of knockoffs, feature statistics, and the way in which the final graph estimate is obtained. For any given data set, it is not clear a priori what choices of these hyperparameters are optimal. We therefore use a sample-splitting-recycling procedure that first uses half of the samples to select the hyperparameters, and then learns the graph using all samples, in such a way that the finite sample FDR control still holds. We compare our method to several competitors in simulations and on a real data set.
翻译:我们提出一种新的方法来学习高斯图形模型的结构, 并使用有限的样本错误的发现率控制。 我们的方法以 Barber 和 Cand ⁇ e} 的淘汰框架为基础, 用于线性模型。 我们通过使用本地( 以节点为基础的) 和全球性( 以绘图为基础的) 步骤, 扩展其图形模型设置方法 : 我们为本地每个节点构建取舍和特征统计, 然后解决全球优化问题, 以确定每个节点的阈值 。 然后我们通过将每个节点的特征统计与其阈值进行比较, 从而得出我们的图表估计结果, 来估计每个节点的相邻。 我们建议的方法非常灵活, 其含义是: 选择取舍的自由, 特征统计, 以及获得最后图表估计的方式。 对于任何给定的数据集, 我们无法先验清这些超分解仪的最佳选择是什么。 因此, 我们使用一个抽样分解循环程序来决定每个节点的阈值, 然后用所有样本来学习图表, 其方法非常灵活, 使得 定样的 FDR 控制仍然维持着。 我们比较我们的方法 和几个竞争者 模拟 。