Existing graph summarization algorithms are tailored to specific graph summary models, only support one-time batch computation, are designed and implemented for a specific task, or are evaluated using static graphs. Our novel, incremental, parallel algorithm addresses all of these shortcomings. We support infinitely many structural graph summary models defined in a formal language. All graph summaries can be updated in time $\mathcal{O}(\Delta \cdot d^k)$, where $\Delta$ is the number of additions, deletions, and modifications to the input graph, $d$ is its maximum degree, and $k$ is the maximum distance in the subgraphs considered while summarizing. We empirically evaluate the performance of our incremental algorithm on benchmark and real-world datasets. Overall our experiments show that, for commonly used summary models and datasets, the incremental summarization algorithm almost always outperforms its batch counterpart, even when about $50\%$ of the graph database changes. Updating the summaries of the real-world DyLDO-core dataset with our incremental algorithm is $5$ to $44$~times faster than computing a new summary, when using four cores. Furthermore, the incremental computations require a low memory overhead of only $8\%$ ($\pm 1\%$). Finally, the incremental summarization algorithm outperforms the batch algorithm even when using fewer cores.
翻译:现有的图形总和算算法是针对特定的图形摘要模型定制的,只支持一次性批量计算,为特定任务设计和实施,或用静态图表进行评估。我们的新颖的、递增的平行算法解决所有这些缺点。我们支持以正式语言定义的许多结构图总和模型。所有的图形总和算法可以及时更新$\mathcal{O}(\Delta\cdd ⁇ k),即使图表数据库的变化是50美元左右,也几乎总是超过其批量对应方。用我们递增的算法更新真实的DyLDO核心数据集的概要为最高程度,在总结时考虑的子组的最大距离为$444美元。我们用经验评估基准和真实世界数据集的递增算法的绩效。总体我们实验显示,对于常用的汇总模型和数据集来说,递增的加和总和算算法几乎总是超过其批量对应方,即使图表数据库的变化是50美元左右。在我们递增的算法中更新了真实世界DyLD核心数据集集的概要,在总结过程中考虑的递增值为5美元至44美元,最后需要快速的缩缩缩缩缩缩缩缩缩 。