In this paper, we investigate the Gaussian graphical model inference problem in a novel setting that we call erose measurements, referring to irregularly measured or observed data. For graphs, this results in different node pairs having vastly different sample sizes which frequently arises in data integration, genomics, neuroscience, and sensor networks. Existing works characterize the graph selection performance using the minimum pairwise sample size, which provides little insights for erosely measured data, and no existing inference method is applicable. We aim to fill in this gap by proposing the first inference method that characterizes the different uncertainty levels over the graph caused by the erose measurements, named GI-JOE (Graph Inference when Joint Observations are Erose). Specifically, we develop an edge-wise inference method and an affiliated FDR control procedure, where the variance of each edge depends on the sample sizes associated with corresponding neighbors. We prove statistical validity under erose measurements, thanks to careful localized edge-wise analysis and disentangling the dependencies across the graph. Finally, through simulation studies and a real neuroscience data example, we demonstrate the advantages of our inference methods for graph selection from erosely measured data.
翻译:在本文中,我们用我们称之为不定期测量或观测数据的新环境来调查高斯图形模型模型的推断问题。 对于图表,这导致不同的节点对不同的节点的样本大小差异很大,这在数据整合、基因组学、神经科学和传感器网络中经常出现。现有的作品使用最小对称样本大小来描述图形选择性能,这为严格测量数据提供了很少的洞察力,而且没有适用现有的推断方法。我们的目标是通过提出第一个推断方法来填补这一差距,该方法将测量结果的不同不确定性程度定性于由神经测量(GI-JOE(联合观测为Erose时的 Grph Inference))产生的图表。具体地说,我们开发了一种边缘偏差法和附属的FDR控制程序,其中每种边缘的差异取决于与相应邻居相关的样本大小。我们通过谨慎的局部边际分析证明统计的有效性,并且分解了图上的依赖性。最后,我们通过模拟研究和真实的神经学数据选择方法展示了我们测量的图表的优势。