Many modern applications involve accessing and processing graphical data, i.e. data that is naturally indexed by graphs. Examples come from internet graphs, social networks, genomics and proteomics, and other sources. The typically large size of such data motivates seeking efficient ways for its compression and decompression. The current compression methods are usually tailored to specific models, or do not provide theoretical guarantees. In this paper, we introduce a low-complexity lossless compression algorithm for sparse marked graphs, i.e. graphical data indexed by sparse graphs, which is capable of universally achieving the optimal compression rate in a precisely defined sense. In order to define universality, we employ the framework of local weak convergence, which allows one to make sense of a notion of stochastic processes for sparse graphs. Moreover, we investigate the performance of our algorithm through some experimental results on both synthetic and real-world data.
翻译:许多现代应用涉及访问和处理图形数据,即自然用图表索引编制的数据,例子来自互联网图表、社交网络、基因组学和蛋白质组学以及其他来源。这类数据通常规模很大,这促使人们寻找有效的压缩和减压方法。目前的压缩方法通常适合特定模型,或者不提供理论保障。在本文中,我们为稀薄的标记图引入了低复杂性无损压缩算法,即用稀薄的图表编制数据,能够普遍达到精确定义的最佳压缩率。为了界定普遍性,我们采用了本地薄弱的趋同框架,从而可以理解稀薄图形的随机过程概念。此外,我们通过合成数据和现实世界数据的一些实验结果来调查我们的算法的性能。