The large sizes of real-world graphs in many domains make graph processing computationally prohibitive. A practical approach is to perform analytics on the compressed summary of the graph that retains the essential structure with bounded error. Many summarization methods have been proposed; however, these methods have three main shortcomings: i) mainly focused on minimizing the reconstruction error and often produce dense and memory extensive summaries ii) most of them do not scale to large graphs and iii) current techniques usually do not consider node attributes during summarization. Although, these attributes play an important role in drawing actionable analytics from real-world graphs. We devise a unified approach, SsAG, that is scalable, incorporates graph topology and node attributes for summarization and produces sparse summaries. We propose SsAG, a probabilistic approach to select nodes for merging by incorporating both the graph topology and node attribute information to make a summary. Furthermore, we present a sparsification approach to reduce the size of the summary with a negligible impact on reconstruction error. We also present analytical bounds on the runtime and provide an approximation guarantee on the quality of our solution. We compare SsAG with the state-of-the-art methods to show that SsAG is comparable in quality and more efficient and scalable. We further demonstrate the goodness of SsAG by accurately and efficiently answering the queries related to the graph structure and attribute information using the summary only.
翻译:在许多域中,真实世界图的大小很大,使得图表处理的计算令人难以接受。一个实用的做法是对图表的压缩摘要进行分析,保留基本结构的基本结构,但提出了许多概括方法;然而,这些方法有三个主要缺陷:(1) 主要是尽量减少重建错误,并往往产生大量和记忆广泛的摘要;(2) 大部分不至于大图和三) 目前的技术通常不考虑总和期间的节点属性。虽然这些属性在从真实世界图中绘制可操作分析的解析器方面起着重要作用。我们设计了一个统一的方法,即SSAG,这是可缩放的,包含图表的表层和节点属性,以进行汇总,并产生少许的摘要。我们建议SSAG,一种选择合并节点的概率方法,既包括图表表层表,又不将信息归结为摘要。此外,我们提出一种缩小概要规模的方法,对重建错误的影响微不足道。我们还在运行时的图表中提出分析约束性标定,并且更准确地将SAG的精度与SAR的精度进行对比,我们用更精确的精度来显示SA的精度的精度的精度的精度。