K-core decomposition is a commonly used metric to analyze graph structure or study the relative importance of nodes in complex graphs. Recent years have seen rapid growth in the scale of the graph, especially in industrial settings. For example, our industrial partner runs popular social applications with billions of users and is able to gather a rich set of user data. As a result, applying K-core decomposition on large graphs has attracted more and more attention from academics and the industry. A simple but effective method to deal with large graphs is to train them in the distributed settings, and some distributed K-core decomposition algorithms are also proposed. Despite their effectiveness, we experimentally and theoretically observe that these algorithms consume too many resources and become unstable on super-large-scale graphs, especially when the given resources are limited. In this paper, we deal with those super-large-scale graphs and propose a divide-and-conquer strategy on top of the distributed K-core decomposition algorithm. We evaluate our approach on three large graphs. The experimental results show that the consumption of resources can be significantly reduced, and the calculation on large-scale graphs becomes more stable than the existing methods. For example, the distributed K-core decomposition algorithm can scale to a large graph with 136 billion edges without losing correctness with our divide-and-conquer technique.
翻译:K核心分解是分析图表结构或研究复杂图表中节点相对重要性的一种常用的衡量标准。 近些年来,特别是在工业环境中,图表的规模迅速增长。 例如, 我们的工业伙伴在数十亿用户中运用广受欢迎的社会应用程序,能够收集丰富的用户数据。 因此, 在大型图表中应用K核心分解吸引了学术界和工业界越来越多的关注。 处理大图表的一个简单而有效的方法是在分布式设置中培训它们, 并且还提出了一些分布式K核心分解算法。 尽管这些算法的有效性很高,但我们实验和理论上都观察到,这些算法消耗了太多资源,在超大型图表中变得不稳定,特别是在给定的资源有限的情况下。 在本文中,我们处理这些超大型图表,并在分布式K核心分解算法之外提出分解战略。 我们在三个大图表中评估我们的方法。 实验结果表明,资源消耗量可以大大减少,而在实验结果中,在理论上,在超大型图表上这些算法会变得不稳定。 在大规模比例上, 以现有比例法来计算,可以更稳定地分析, 。