In contrast to time series, graphical data is data indexed by the vertices and edges of a graph. Modern applications such as the internet, social networks, genomics and proteomics generate graphical data, often at large scale. The large scale argues for the need to compress such data for storage and subsequent processing. Since this data might have several components available in different locations, it is also important to study distributed compression of graphical data. In this paper, we derive a rate region for this problem which is a counterpart of the Slepian-Wolf theorem. We characterize the rate region when the statistical description of the distributed graphical data can be modeled as being one of two types - as a member of a sequence of marked sparse Erdos-Renyi ensembles or as a member of a sequence of marked configuration model ensembles. Our results are in terms of a generalization of the notion of entropy introduced by Bordenave and Caputo in the study of local weak limits of sparse graphs. Furthermore, we give a generalization of this result for Erdos-Renyi and configuration model ensembles with more than two sources.
翻译:与时间序列相对照,图形数据是用图表的脊椎和边缘进行索引的数据。现代应用,如互联网、社交网络、基因组学和蛋白质组学等,往往大规模地生成图形数据。大比例表示需要压缩这些数据用于储存和随后的处理。由于这些数据在不同地点可能有几个组件,研究分布式图形数据压缩也很重要。在本文中,我们为这一问题得出一个比率区域,这是Slepian-Wolf 理论的对应区域。当分布式图形数据的统计描述可以建模为两种类型之一时,我们给这个比率区域定性,即分布式图形数据的统计描述可以建模为两种类型之一――作为有标记的稀薄 Erdos-Renyi ensengembles 序列的成员,或者作为有标记的配置模型组合序列的成员。我们的结果是,Bordenave和Caputo在研究本地稀薄图形的弱限值时,将这种结果概括为两种类型之一。此外,我们用两个比模型来源更宽泛的Erdos-Renyi和配置模型来源。