Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.
翻译:通过节点组合绘制图形总和,是一种常用的方法,通过将原始图形的节点分组为超级节点,将编码边缘分组为超级节点和编码边缘,形成上层,从而最大限度地减少对相邻信息的丢失。这些摘要在大型图形分析中具有巨大的应用性,因为其大小小和查询处理效率高。在本文件中,我们重新将损失最小化问题转化为等效的整数最大化问题。在第二阶段,我们首先允许为整数最大化而采用宽松(折中)的解决方案,从而分析暴露与相邻矩阵光谱特性的内在联系。因此,我们设计了一个称为SpecSumm的算法,由两个阶段组成。在第一阶段,我们以光谱图理论为动力,在对相邻矩阵的最大(规模)类分解法中应用K- means 组合。在第二个阶段,我们提出一种贪婪的超额超额超额超额分配,以更新初始任务来进一步提高摘要质量。最后,我们通过对11个数据集的广泛实验,我们展示SpecSuggs-imme-immagraphal-graphalgraphas-graphas-graphmagraphal