Graphs are an essential data structure that can represent the structure of social networks. Many online companies, in order to provide intelligent and personalized services for their users, aim to comprehensively analyze a significant amount of graph data with different features. One example is k-core decomposition which captures the degree of connectedness in social graphs. The main purpose of this report is to explore a distributed algorithm for k-core decomposition on Apache Giraph. Namely, we would like to determine whether a cluster-based, Giraph implementation of k-core decomposition that we provide is more efficient than a single-machine, disk-based implementation on GraphChi for large networks. In this report, we describe (a) the programming model of Giraph and GraphChi, (b) the specific implementation of k-core decomposition with Giraph, and (c) the result comparison between Giraph and GraphChi. By analyzing the results, we conclude that Giraph is faster than GraphChi when dealing with large data. However, since worker nodes need time to communicate with each other, Giraph is not very efficient for small data.
翻译:许多在线公司为了向用户提供智能和个性化服务,旨在全面分析大量具有不同特点的图形数据。其中一个例子是K-核心分解,该分解反映了社会图中的连接程度。本报告的主要目的是探索在Apache Giraph上对K-核心分解进行分布的算法。也就是说,我们希望确定我们提供的K-核心分解的集基基拉夫实施k-核心分解是否比在GreaphChi上为大型网络提供单机、磁盘执行效率更高。我们在本报告中描述:(a) Giraph和GreagChi的编程模型,(b)与Giraph的K-核心分解的具体实施,以及(c) Giraph和GrapChi之间的结果比较。通过分析结果,我们得出结论,在处理大型数据时,Giraph比Giraph更快。然而,由于工人需要时间相互沟通,Giraph对小型数据来说并不十分有效。