In many applications concerning statistical graphical models the data originate from several subpopulations that share similarities but have also significant differences. This raises the question of how to estimate several graphical models simultaneously. Compiling all the data together to estimate a single graph would ignore the differences among subpopulations. On the other hand, estimating a graph from each subpopulation separately does not make efficient use of the common structure in the data. We develop a new method for simultaneous estimation of multiple graphical models by estimating the topological neighborhoods of the involved variables under a sparse inducing penalty that takes into account the common structure in the subpopulations. Unlike the existing methods for joint graphical models, our method does not rely on spectral decomposition of large matrices, and is therefore more computationally attractive for estimating large networks. In addition, we develop the asymptotic properties of our method, demonstrate its the numerical complexity, and compare it with several existing methods by simulation. Finally, we apply our method to the estimation of genomic networks for a lung cancer dataset which consists of several subpopulations.
翻译:在涉及统计图形模型的许多应用中,数据来自几个具有相似性但也有重大差异的亚群,这就提出了如何同时估计几个图形模型的问题。将所有数据合并在一起来估计一个单一的图形将忽略各亚群之间的差别。另一方面,从每个亚群中分别估算一个图表并不能有效地使用数据中共同结构。我们开发了一种新方法,通过估计在稀疏诱因子群下相关变量的表层群群群,考虑到各亚群群群的共同结构,同时估算多个图形模型。与现有的共同图形模型方法不同,我们的方法并不依赖于大型矩阵的光谱分解,因此在计算上对大型网络更具吸引力。此外,我们开发了我们方法的无症状特性,展示其数字复杂性,并通过模拟将其与若干现有方法进行比较。最后,我们用我们的方法估算由几个子群组成的肺癌数据集的基因网络的估计数。