Community detection is an important research topic in graph analytics that has a wide range of applications. A variety of static community detection algorithms and quality metrics were developed in the past few years. However, most real-world graphs are not static and often change over time. In the case of streaming data, communities in the associated graph need to be updated either continuously or whenever new data streams are added to the graph, which poses a much greater challenge in devising good community detection algorithms for maintaining dynamic graphs over streaming data. In this paper, we propose an incremental community detection algorithm for maintaining a dynamic graph over streaming data. The contributions of this study include (a) the implementation of a Distributed Weighted Community Clustering (DWCC) algorithm, (b) the design and implementation of a novel Incremental Distributed Weighted Community Clustering (IDWCC) algorithm, and (c) an experimental study to compare the performance of our IDWCC algorithm with the DWCC algorithm. We validate the functionality and efficiency of our framework in processing streaming data and performing large in-memory distributed dynamic graph analytics. The results demonstrate that our IDWCC algorithm performs up to three times faster than the DWCC algorithm for a similar accuracy.
翻译:在图解分析中,社区探测是一个重要的研究课题,具有广泛的应用范围。在过去几年中开发了各种静态社区探测算法和质量衡量标准。然而,大多数真实世界的图表并不是静态的,而且往往随时间而变化。在流数据方面,相关图表中的社区需要不断更新,或者在将新的数据流添加到图中时不断更新,这对设计良好的社区探测算法以维持流数据中的动态图表构成更大的挑战。在本文中,我们建议为在流数据上维持动态的图表而采用一种渐进式社区探测算法。本研究的贡献包括:(a) 实施分布式加权社区集算法(DWCC),(b) 设计和实施新的增量分配式社区集算法(IDWCC),(c) 将我们的IDWCC算法与DWCC算法的性进行比较的实验性研究。我们验证了我们处理流数据的框架的功能和效率,并在流动中进行大型分布式分布式的图表分析分析。结果显示,我们的IDWC的精确度比DWC的算法要更快。