In this work, we look at a two-sample problem within the framework of Gaussian graphical models. When the global hypothesis of equality of two distributions is rejected, the interest is usually in localizing the source of difference. Motivated by the idea that diseases can be seen as system perturbations, and by the need to distinguish between the origin of perturbation and components affected by the perturbation, we introduce the concept of a minimal seed set, and its graphical counterpart a graphical seed set. They intuitively consist of variables driving the difference between the two conditions. We propose a simple testing procedure, linear in the number of nodes, to estimate the graphical seed set from data, and study its finite sample behavior with a stimulation study. We illustrate our approach in the context of gene set analysis by means of a publicly available gene expression dataset.
翻译:在这项工作中,我们在Gaussian图形模型的框架内研究一个两样问题。当两种分布平等的全球假设被否定时,人们的兴趣通常是将差异的来源本地化。受疾病可被视为系统扰动的想法的驱使,以及区分扰动源和受扰动影响的成分的需要的驱使,我们引入了最小种子组的概念,而其图形对应方则引入了图形种子组。它们直观地包含驱动两种条件差异的变量。我们提出了一个简单的测试程序,即从数据中估算图形种子组的线性,并通过一项刺激研究来研究其有限的样本行为。我们通过公开提供的基因表达数据集来说明我们在基因组分析方面的做法。