Conductance-based graph clustering has been recognized as a fundamental operator in numerous graph analysis applications. Despite the significant success of conductance-based graph clustering, existing algorithms are either hard to obtain satisfactory clustering qualities, or have high time and space complexity to achieve provable clustering qualities. To overcome these limitations, we devise a powerful \textit{peeling}-based graph clustering framework \textit{PCon}. We show that many existing solutions can be reduced to our framework. Namely, they first define a score function for each vertex, then iteratively remove the vertex with the smallest score. Finally, they output the result with the smallest conductance during the peeling process. Based on our framework, we propose two novel algorithms \textit{PCon\_core} and \emph{PCon\_de} with linear time and space complexity, which can efficiently and effectively identify clusters from massive graphs with more than a few billion edges. Surprisingly, we prove that \emph{PCon\_de} can identify clusters with near-constant approximation ratio, resulting in an important theoretical improvement over the well-known quadratic Cheeger bound. Empirical results on real-life and synthetic datasets show that our algorithms can achieve 5$\sim$42 times speedup with a high clustering accuracy, while using 1.4$\sim$7.8 times less memory than the baseline algorithms.
翻译:以行为为基础的图形群集已被公认是众多图形分析应用中的基本操作者。 尽管以行为为基础的图形群集取得了显著的成功, 现有的算法要么很难获得令人满意的组合组合质量, 要么是时间和空间的复杂性很高, 以实现可变组合质量。 为了克服这些限制, 我们设计了一个强大的 textit{ peeling} 基于图形群集框架 \ textit{ PCon} 。 我们显示许多现有解决方案可以缩到我们的框架中。 也就是说, 它们首先定义每个顶端的评分功能, 然后用最小的分来迭替去除顶端。 最后, 现有的算法或者很难获得令人满意的组合组合质量质量质量质量质量, 或者在我们的框架基础上, 我们提出了两种新的算法 \ textitleit{ PCon} 和\ emph{ PCon ⁇ de}, 具有线性的时间和空间复杂性。 我们可以高效和有效地识别来自超过十亿边缘的巨型图表中的群集。 令人惊讶的是, 我们证明\ \\ devode} 能够识别近为接近美元近值的群集的群集, 。 8 。 导致一个重要的理论- train- dalimalimalimalimalmaxim laus massy lax laususususal laus