Many multivariate data such as social and biological data exhibit complex dependencies that are best characterized by graphs. Unlike sequential data, graphs are, in general, unordered structures. This means we can no longer use classic, sequential-based compression methods on these graph-based data. Therefore, it is necessary to develop new methods for graph compression. In this paper, we present universal source coding methods for the lossless compression of unweighted, undirected, unlabelled graphs. We encode in two steps: 1) transforming graph into a rooted binary tree, 2) the encoding rooted binary tree using graph statistics. Our coders showed better compression performance than other source coding methods on both synthetic and real-world graphs. We then applied our graph coding methods for model selection of Gaussian graphical models using minimum description length (MDL) principle finding the description length of the conditional independence graph. Experiments on synthetic data show that our approach gives better performance compared to common model selection methods. We also applied our approach to electrocardiogram (ECG) data in order to explore the differences between graph models of two groups of subjects.
翻译:社会数据和生物数据等许多多变数据表现出以图表为最佳特征的复杂依赖性。 与相继数据不同, 图表一般是无顺序结构。 这意味着我们不能再在这些基于图形的数据中使用经典的、 以顺序为基础的压缩方法。 因此, 有必要开发新的图形压缩方法 。 在本文中, 我们为无损的、 未加权的、 未定向的、 未标签的图形提供了通用源代码方法。 我们分两个步骤编码:1) 将图形转换成根拔的二进制树,2) 使用图表统计的编码根拔的二进制树。 我们的编码器显示, 压缩性能优于合成图和真实世界图的其他源编码方法。 我们随后运用了我们的图形编码方法, 用于使用最小描述长度( MDL) 原则来选择高斯图形模型, 查找有条件的独立图形的描述长度。 合成数据的实验显示, 我们的方法比通用的模型选择方法表现得更好。 我们还采用了电心图(ECG) 数据方法, 以便探索两个主题组的图形模型的差异 。